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improvements over unpruned jets in identifying top quarks and W bosons and separating them 
from a QCD background, and may be useful in a search for heavy particles. 
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I. INTRODUCTION 



The Large Hadron Collider (LHC) will present an ex- 
citing and challenging environment. Efforts to tease out 
hints of Beyond the Standard Model (BSM) physics from 
complicated final states, typically dominated by Stan- 
dard Model (SM) interactions, will almost surely require 
the use of new techniques applied to familiar quantities. 
Of particular interest is the question of how we think 
about hadronic jets at the LHC [T|. Historically jets 
have been employed as surrogates for individual short dis- 
tance energetic partons that evolve semi-independently 
into showers of energetic hadrons on their way from the 
interaction point through the detectors. An accurate re- 
construction of the jets in an event then provides an ap- 
proximate description of the underlying short-distance, 
haxd-scattering kinematics. With this picture in mind, 
it is not surprising that the internal structure of jets, 
e.g., the fact that the experimentally detected jets ex- 
hibit nonzero masses, has rarely been used in analyses 
at the Tevatron. However, we can anticipate that large- 
mass objects, which yield multijet decays at the Teva- 
tron, e.g., W/Z's (two jets) or top quarks (three jets), 
will often be produced with sufficient boosts to appear 
as single jets at the LHC. Thus the masses of jets and fur- 
ther details of the internal structure of jets will be useful 
in identifying single jets not only as familiar objects like 
the aforementioned vector bosons and top quarks, but 
also as less familiar cascade decays of SUSY particles or 
the decays of V-particles In fact, the idea of study- 
ing the subjet structure of jets has been around for some 
time, but initially this study took the form of discussing 
the number of jets as a function of the jet resolution 
scale, typically at e^e~ colliders, or the pT distribution 
within the cone of (cone) jets at the Tevatron. (See, for 
example, the analyses in [31I31|S].) Recently a variety of 
studies [ilTllHliaiTnillllinillllllllSKIB] have ap- 
peared suggesting a range of techniques for identifying 
jets with specific properties. It is to this discussion that 
we intend to contribute. Not surprisingly the current lit- 



erature focuses on "tagging" the single jet decays of the 
particles mentioned above and the Higgs boson. How- 
ever, since we cannot be certain as to the full spectrum 
of new physics to be found at the LHC, it is important 
to keep in mind the underlying goal of separating QCD 
jets from any other type of jet. This will be challenging 
and the diversity of approaches currently being discussed 
in the literature is essential. Successful searches for new 
physics at the LHC will likely employ a variety of tech- 
niques. The analysis described below presents detailed 
properties of the "pruning" procedure outlined in [13J . 

In the following discussion we will focus on jets defined 
by kx-type jet algorithms. The iterative recombination 
structure of these algorithms yields jets that, by defi- 
nition, are assembled from a sequence of protojets, or 
subjets. It is natural to try to use this subjet structure 
(along with the pT and mass of the jet) to distinguish 
different types of jets. A combination of cuts and like- 
lihood methods applied to this subjet structure can be 
used to identify jets, and thus events, likely to be en- 
riched with vector or Higgs bosons, top quarks, or BSM 
physics. Such jet-labeling techniques can then be used in 
conjunction with more familiar jet- and lepton-counting 
methods to isolate new physics at the LHC. 

An essential aspect of high-pT jets at the LHC is that 
the jet algorithm ensures nonzero masses not only for 
the individual jets, but also for the subjets. For recombi- 
nation algorithms, we can analyze the 1 — > 2 branching 
structure inherent in the substructure of the jet in terms 
of concepts familiar from usual two-body decays. In fact, 
it is exactly such decays (say from W /Z and top quark 
decays) that we want to compare in the current study to 
the structure of "ordinary" QCD (light quark and gluon) 
jets. As we analyze the internal structure of jets we will 
attempt to keep in mind the various limitations of jets. 
Jets are not intrinsically well-defined, but exhibit (of- 
ten broad) distributions that are shaped by the very al- 
gorithms that define them. Further, true experimental 
QCD jets are not identical to the leading-logarithm par- 
ton showers produced by Monte Carlos, but include also 
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(perturbative) contributions from hard emissions, which 
may be important for precisely the properties of jets we 
want to discuss here, including masses. Finally, the back- 
ground particles from the underlying event, and from 
pile-up at higher luminosities, will influence the prop- 
erties of the jets observed in the detector. 

During the run-up to the LHC, jet substructure has 
increasingly drawn interest as an analysis tool [BJ [3 HJ 
[9l[l0j[IIl[T2l[13l[T4l[15l[16]. The LHC will generate a 
deluge of multijet events that form a background to most 
interesting processes, and techniques to separate these 
signals will prove very useful. To that end, various groups 
have shown that jet substructure can be used in 
top [Hllinillllllllin], and Higgs PIU] identification, as 
well as reconstruction of SUSY mass spectra [TJ [Tl] . 

We take a more general approach below. Instead of 
describing a technique using jet substructure to find a 
particular signal, we study features of recombination al- 
gorithms. We identify major systematic effects in jets 
found with the kT and CA algorithms, and discuss how 
they affect the found jet substructure. To reduce these 
systematic effects we define a generic procedure, which 
we call pruning, that improves the jet substructure for 
the purposes of heavy particle identification. We note 
that pruning is based on the same ideas as other jet sub- 
structure methods, and |TT], in that these techniques 
also modify the jet substructure to improve heavy parti- 
cle identification. Pruning differs from these methods in 
that it is built as a broad jet substructure analysis tool, 
and one that can be used in a variety of searches. To 
this end, the mechanics of the pruning procedure differ 
from other methods, allowing it to be generalized more 
easily. Pruning can be performed using either the CA 
or kx algorithms to generate substructure for a jet, and 
the procedure can be implemented on jets coming from 
any algorithm, since the procedure is independent of the 
jet finder. In the studies below, we will quantify several 
aspects of the performance of pruning to demonstrate its 
utility. 

The following discussion includes a review of jet algo- 
rithms (Sec. and a review of the expected properties 



of jets from QCD (Sec. Ill I and those from heavy parti- 
cles (Sec. IV). In studying QCD and heavy particle jets, 
we will discuss key systematic effects imposed on the jet 
substructure by the jet algorithm itself. In Sec. |V] we 
then contrast the expected substructure for QCD and 
heavy particle jets and describe how the task of sepa- 
rating the two types of jets is complicated by systematic 
effects of the jet algorithm and the hadronic environment. 
In Sec. |VI| we show how these systematic effects can by 
reduced by a procedure we call "pruning". Sees. |Vli| and 



II. RECOMBINATION ALGORITHMS AND 
JET SUBSTRUCTURE 

Jet algorithms can be broadly divided into two cate- 
gories, recombination algorithms and cone algorithms [T]. 
Both types of algorithms form jets from protojets, which 
are initially generic objects such as calorimeter towers, 
topological clusters, or final state particles. Cone algo- 
rithms fit protojets within a fixed geometric shape, the 
cone, and attempt to find stable configurations of those 
shapes to find jets. In the cone-jet language, "stable" 
means that the direction of the total four- momentum of 
the protojets in the cone matches the direction of the 
axis of the cone. Recombination algorithms, on the other 
hand, give a prescription to pairwise (re) combine proto- 
jets into new protojets, eventually yielding a jet. For 
the recombination algorithms studied in this work, this 
prescription is based on an understanding of how the 
QCD shower operates, so that the recombination algo- 
rithm attempts to undo the effects of showering and ap- 
proximately trace back to objects coming from the hard 
scattering. The anti-kx algorithm [17^ functions more 
like the original cone algorithms, and its recombination 
scheme is not designed to backtrack through the QCD 
shower. Cone algorithms have been the standard in col- 
lider experiments, but recombination algorithms are find- 
ing more frequent use. Analyses at the Tevatron [TH] have 
shown that the most common cone and recombination al- 
gorithms agree in measurements of jet cross sections. 

A general recombination algorithm uses a distance 
measure pij between protojets to control how they are 
merged. A "beam distance" pi determines when a proto- 
jet should be promoted to a jet. The algorithm proceeds 
as follows: 

0. Form a list L of all protojets to be merged. 

1 . Calculate the distance between all pairs of protojets 
in L using the metric pij, and the beam distance 
for each protojet in L using pi. 

2. Find the smallest overall distance in the set 

3. If this smallest distance is a pij, merge protojets i 
and j by adding their four vectors. Replace the pair 
of protojets in L with this new merged protojet. If 
the smallest distance is a pi, promote protojet i to 
a jet and remove it from L. 

4. Iterate this process until L is empty, i.e., all proto- 
jets have been promoted to jets.^ 



VIII describe our Monte Carlo studies of pruning and 
their results. Additional computational details are pro- 
vided in Appendix In Sec. |IX| we summarize these 
results and provide concluding remarks. 



^ This defines an inclusive algorithm. For an exclusive algorithm, 
there are no promotions, but instead of recombining until L is 
empty, mergings proceed until all pij exceed a fixed pcut- 
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For the kx [H [201 IH] and Cambridge- Aachen (CA) 
[22] recombination algorithms the metrics are 



B. Variables Describing Branchings and Their 
Kinematics 



kx : Pij = i-nm{pTi,PTj)^Rij/D, pi = pTt] 

CA : p,, = AR,^/D, p, = l. ^' 

Here pTi is the transverse momentum of protojet i and 
Ai?y = — (j)jY + {Vi — VjY is a measure of the an- 

gle between two protojets that is invariant under boosts 
along and rotations around the beam direction, (p is 
the azimuthal angle around the beam direction, (p — 
tan~^ Py/px, and y is the rapidity, y — tanK'^ pz/ E, 
with the beam along the z-axis. The angular parameter 
D governs when protojets should be promoted to jets: it 
determines when a protojet 's beam distance is less than 
the distance to other objects. D provides a rough mea- 
sure of the typical angular size (in y-(l>) of the resulting 
jets. 

The recombination metric pij determines the order in 
which protojets are merged in the jet, with recombina- 
tions that minimize the metric performed first. From the 
definitions of the recombination metrics in Eq. ([T]) , it is 
clear that the kx algorithm tends to merge low-px proto- 
jets earlier, while the CA algorithm merges pairs in strict 
angular order. This distinction will be very important in 
our subsequent discussion. 



A. Jet Substructure 

A recombination algorithm naturally defines substruc- 
ture for the jet. The sequence of recombinations tells us 
how to construct the jet in step-by-step 2^1 mergings, 
and we can unfold the jet into two, three, or more sub- 
jets by undoing the last recombinations. Because the jet 
algorithm begins and ends with physically meaningful 
information (starting at calorimeter cells, for example, 
and ending at jets), the intermediate (subjet) informa- 
tion generated by the kx and CA (but not the anti-kx^) 
recombination algorithms is expected to have physical 
significance as well. In particular, we expect the earliest 
recombinations to approximately reconstruct the QCD 
shower, while the last recombinations in the algorithm, 
those involving the largest-p^ degrees of freedom, may 
indicate whether the jet was produced by QCD alone or 
a heavy particle decay plus QCD showering. To discuss 
the details of jet substructure, we begin by defining rel- 
evant variables. 



The anti-kx algorithm has the metrics pij = 
min(p^^ ,p7^j)ARij / D, pi = P^j , so it tends to cluster 
protojets with the hardest protojet, resulting in cone-like jets 
with uninteresting substructure. 



In studying the substructure produced by jet algo- 
rithms, it will be useful to describe branchings using a 
set of kinematic variables. Since we will consider the 
substructure of (massive) jets reconstructing kinematic 
decays and of QCD jets, there are two natural choices 
of variables. Jet rest frame variables are useful to un- 
derstand decays because the decay cross section takes a 
simple form. Lab frame variables invariant under boosts 
along and rotations around the beam direction are useful 
because jet algorithms are formulated in terms of these 
variables, so algorithm systematics are most easily un- 
derstood in terms of them. The QCD soft/coUinear sin- 
gularity structure is also easy to express in lab frame 
variables. We describe these two sets of variables and 
the relationship between them in this subsection. 

Naively, there are twelve variables completely describ- 
ing a 1 — > 2 splitting. Here we will focus on the top 
branching (the last merging) of the jet splitting into two 
daughter subjets, which we will label J — > 1,2. Imposing 
the four constraints from momentum conservation to the 
branching leaves eight independent variables. The invari- 
ance of the algorithm metrics under longitudinal boosts 
and azimuthal rotations removes two of these (they are 
irrelevant). For simplicity we will use this invariance to 
set the jet's direction to be along the x-axis, defining the 
z-axis to be along the beam direction. Therefore there 
are six relevant variables needed to describe a 1 — > 2 
branching. Three of these variables are related to the 
three-momenta of the jet and subjets, and the other three 
are related to their masses. 

The two sets of variables we will use to understand jet 
substructure share common elements. Of the six vari- 
ables, only one needs to be dimensionful, and we can 
describe all other scales in terms of this one. The dimen- 
sionful variable we choose is the mass mj of the jet. In 
addition, we use the masses of the two daughter subjets 
scaled by the jet mass: 

mi 1712 ,r,^ 

Qi = and 02 = . (2) 

mj mj 

We choose the particle labeled by '1' to be the heavier 
particle, ai > 02. The three masses, mj, ai, and 02, will 
be common to both sets of variables. Additionally, we 
will typically want to fix the pr of the jet and determine 
how the kinematics of a system change as pt, is varied. 
For QCD, a useful dimensionless quantity is the ratio of 
the mass and pT of the jet, whose square we call xj: 



Pt.,' 



(3) 



For decays, we will opt instead to use the familiar mag- 
nitude 7 of the boost of the heavy particle from its rest 
frame to the lab frame, which is related to xj by 



7 



1 



1, xj 



1 



(4) 
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The remaining two variables, which are related to the 
momenta of the subjets, will differ between the rest frame 
and lab frame descriptions of the splitting. 

Unpolarized 1 — > 2 decays are naturally described in 
their rest frame by two angles. These angles are the polar 
and azimuthal angles of one particle (the heavier one, 
say) with respect to the direction of the boost to the lab 
frame, and we label them f^nd respectively. Since 
we are choosing that the final jet be in the x direction, 
9q is measured from the x direction while 0o is the angle 
in the y-z plane, which we choose to be measured from 
the ij direction. Putting these variables together, the set 
that most intuitively describes a heavy particle decay is 
the "rest frame" set 



{mj, oi, 02, 7, cos 6*0, 0o}- 



(5) 



The requirement that the (last) recombination vertex be- 
ing described actually "fit" in a single jet reconstructed 
in the lab frame yields the constraint Ai?i2 < D, where 
Ai?i2 is treated as a function of the variables in Eq. ([5|. 

Consider describing the same kinematics in the lab 
frame. As noted above, we want to choose variables that 
arc invariant under longitudinal boosts and azimuthal 
rotations, which can be mapped onto the recombination 
metrics of the jet algorithm. The angle Ai?i2 between 
the daughter particles is a natural choice, as is the ratio 
of the minimum daughter pT to the parent pt, which is 
commonly called z: 



min(pTi,PT2) 

PTj 



(6) 



These variables make the recombination metrics for the 
kx and CA algorithms simple: 

pi2(kT) =PT,,2;Ai?i2 and pi2(CA) = Ai?i2. (7) 

Note that for a generic recombination, the momentum 
factors in the denominator of Eq. (|6| and in the kx metric 
in Eq. (7| should be pTp, the momentum of the the parent 
or combined subjet of the 2 — > 1 recombination. 

From these considerations we choose to describe re- 
combinations in the lab frame with the set of variables 



{mj, oi, 02, xj, z, Ai?i2}. 



(8) 



In using these variables it is essential to understand the 
structure of the corresponding phase space, especially for 
the last two variables in both sets. Naively, for actual de- 
cays, we would expect that the phase space in cosSg and 
00 of the rest frame variable set in Eq. ([s]) is simple, 
with boundaries that are independent of the value of the 
other variables. However, since we require that the decay 
"fits" in a jet (so that all the variables are defined), con- 
straints and correlations appear. The presence of these 
constraints and correlations is more apparent for the lab 
frame variables Ai?i2 and z since the recombination algo- 
rithm acts directly on the these variables. As a first step 
in understanding these correlations we plot in Fig. [l] the 



contour Ai?i2 = D{= 1.0) in the (cos 610, 0o) phase space 
for different values of 7 and over different choices for oi 
and 02. These specific values of Oi and 02 correspond 
to a variety of interesting processes: oi = 02 = gives 
the simplest kinematics and is therefore a useful starting 
point; ai = 0.46, 02 = gives the kinematics of the top 
quark decay; ai = 0.9,02 = and oi = 0.3, 02 = 0.1 are 
reasonable values for subjet masses from the CA and kx 
algorithms respectively. The contour Ai?i2 = D defines 
the boundary in phase space where a 1 ^ 2 process will 
no longer fit in a jet, with the interior region correspond- 
ing to splittings with Ai?i2 < D. Note that the contour 
is nearly straight and vertical, increasingly so for larger 
7. This is a reflection of the fact that Ai?i2 is nearly 
independent of 0o, up to terms suppressed by 7~^. 

While the constraint Ai?i2 < D for the 1 ^ 2 to fit in 
a jet becomes simpler in the (2,Ai?i2) phase space, the 
boundaries of the phase space become more complex. In 
Fig. [2] we plot the available phase space in (z, Ai?i2) for 
the same values of xj, oi, and 02 as in Fig.[l| translating 
the value of 7 into xj. The most striking feature is that 
for fixed xj, Oi, and 02, the phase space in {z, Ai?i2) 
is nearly one-dimensional; this is again due to the fact 
that Ai?i2 and also z are nearly independent of (pQ. In 
particular, for oi = 02 = (as in Fig. 
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the phase 

space approximates the contour describing fixed xj for 
small Ai?i2, which takes the simple form 



xj 



Ptj 



z{\-z) 



(9) 



This approximation is accurate even for larger angles, 
Ai?i2 w 1, at the 10% level. Note also that the width of 
the band about the contour described by Eq. ^ is itself 
of order xj. As we decrease xj the band moves down 



and becomes narrower as indicated in Fig. 2a I 



As illustrated in Figs. [2b] and |2d[ we can also see a 
double-band structure to the (z, Ai?i2) phase space. The 
upper band corresponds to the case where the lighter 
daughter is softer (smaller-px) than the heavier daughter 
(and determines z), while the lower band corresponds to 
the case where the heavier daughter is softer. This does 
not occur in Fig. 2a because oi = 02 (the single band is 



double-covered) , or in Fig. 2c because the heavier particle 
is never the softer one for the chosen values of xj. 

Note that we have said nothing about the density of 
points in phase space for either pair of variables. This 
is because the weighting of phase space is set by the dy- 
namics of a process, while the boundaries are set by the 
kinematics. Decays and QCD splittings weight the phase 
space differently, as we will show. 



C. Ordering in Recombination Algorithms 

Having laid out variables useful to describe 1^2 
processes, we can discuss how the jet algorithm orders 
recombinations in these variables. Recombination algo- 
rithms merge objects according to the pairwise metric 
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(a) ai = a2 = (b) ai = 0.46, 02 = (c) ai = 0.9, 02 = (d) ai = 0.3, 02 = 0.1 

FIG. 1: Boundaries in the cos 6*0-00 plane for a recombination step to fit in a jet of size D = 1.0, for several values of 
the boost 7 and the subjet masses {oi, 02}. The "interior" region has Ai?i2 < D. 




(a) ai = 02 = (b) ai = 0.46, 02 = (c) ai = 0.9, 02 = (d) ai = 0.3, 02 = 0.1 



FIG. 2: Boundaries in the Z-AR12 plane for a recombination step of fixed {ai, 02, xj}, for various values of xj and 
the subjet masses {ai, 02}. Configurations with Ai?i2 < D fit in a jet; D — 1.0 is shown for example. 



Pij. The sequence of recombinations is almost always 
monotonic in this metric: as the algorithm proceeds, the 
value typically increases. Only certain kinematic config- 
urations will decrease the metric from one recombination 
to the next, and the monotonicity violation is small and 
rare in practice. 

This means it is rather straightforward to understand 
the typical recombinations that occur at different stages 
of the algorithm. We can think in terms of a phase space 
boundary: the algorithm enforces a boundary in phase 
space at a constant value for the recombination metric 
which evolves to larger values as the recombination pro- 
cess proceeds. If a recombination occurs at a certain 
value of the metric, po, then subsequent recombinations 
are very unlikely to have pij < po, meaning that region 
of phase space is unavailable for further recombinations. 

In Fig. [3] we plot typical boundaries for the CA and 
kx algorithms in the (z,Ai?i2) phase space. For CA, 
these boundaries are simply lines of constant Ai?i2, since 
the recombination metric is pij{CA) — ARij. For kx, 
these boundaries are contours in zAi?i2, and implicitly 
depend on the pr of the parent particle in the split- 
ting. Because the kx recombination metric for i,j — s- p 
is pijik^) = zARijPTp, decreasing the value of pTp will 



shift the boundary out to larger zARij. These algorithm 
dependent ordering effects will be important in under- 
standing the restrictions on the kinematics of the last 
recombinations in a jet. For instance, we expect to ob- 
serve no small-angle late recombinations in a jet defined 
by the CA algorithm. 
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FIG. 3: Typical boundaries (red, dashed lines) on phase 
space due to ordering in the CA and kx algorithms. The 

shaded region below the boundaries is cut out, and the 
more heavily shaded regions correspond to earlier in the 

recombination sequence. The cutoff ARij = I? = 1.0 is 
shown for reference (black, dashed line). 
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D. Studying the Substructure of Recombination 
Algorithms 

In the following sections we discuss various aspects of 
jet substructure, especially as applied toward identifying 
heavy particle decays within single jets and separating 
them from QCD jets. To effectively discriminate between 
jets, we must have an understanding of the substructure 
expected from both QCD and decays. To this end, we will 
study toy models of the underlying 1^2 processes with 
appropriate (but approximate) dynamics. We will also 
study the substructure observed in jets found in simu- 
lated events, which include showering and hadronization, 
for both pure QCD and heavy particle decays. In these 
more realistic jets, with many more degrees of freedom, 
we must understand the role of the jet algorithm in de- 
termining the features of the last recombinations in the 
jet. This bias will impact how (and whether) we can in- 
terpret the last recombinations as relevant to the physics 
of the jet. 

We will find that the differences in the metrics of kx 
and CA will introduce shaping effects on the recombi- 
nations. We will observe these in the distributions of 
kinematic variables of interest, e.g., the jet and subjet 
masses, z, and Ai?i2. The major point of this work will 
be to motivate and develop a method to identify jet sub- 
structure most likely to come from the decay of a heavy 
particle and separate this substructure from recombina- 
tions likely to represent QCD. 



simulated events of both W and top quark decays 
In Sec. [V] we compare the results of Sees. |III| and IV 



In Sec. Ill we study QCD (only) jet masses and sub- 
structure in terms of the variables xj, z and AR12, start- 
ing with a leading-log approximation including only the 
soft and coUinear singularities. We find the distribution 
in xj in this approximation and discuss the implications 
for the substructure in a QCD jet, specifically the distri- 
butions in both z and Ai?i2 for fixed xj. Finally we look 
at the jet mass and substructure distributions found in 
jets from fully simulated events. Of particular interest is 
the algorithm dependence. 

In Sec. IV we first study 1^2 decays with fixed boost 
and massless daughters (e.g., a W decay into quarks) and 
a top quark decay into massless quarks. The parton-level 
top quark decay into three quarks, which is made up of 
two 1^2 decays, is instructive because the jet algorithm 
matters: the CA and kx algorithms can reconstruct the 
jet in different ways. For both kinds of decays, we con- 
sider both the full, unreconstructed decay distributions 
in z and Ai?i2, then proceed to study the shaping ef- 
fects that reconstruction in a single jet has on the "in-a- 
jet" distributions of these variables. We also look at the 
shaping in terms of the rest frame variable cos^Oi which 
provides a good intuitive picture of which decays will be 
reconstructed in a single jet. Understanding this shaping 
will be key to understanding the substructure we expect 
from decays and the effects of the jet algorithm. We con- 
trast this substructure with the expected substructure 
from QCD jets, pointing out key similarities and differ- 
ences. Finally we look at the distributions found in fully 



We also consider the impact of event effects such as the 
underlying event, which are common to all events. In 
particular, we focus on understanding how these contri- 
butions manifest themselves in the substructure of the 
jet and the role that the algorithm plays in determin- 
ing the substructure. We will find that jet algorithms, 
acting on events that include these contributions, yield 
substructure that often obscures the recombinations re- 
constructing a heavy particle decay. This is especially 
true of the CA algorithm, which we will show has a large 
systematic effect on its jet substructure. We will use 
these lessons in later sections to construct the pruning 
procedure to modify the jet substructure, removing re- 
combinations that are likely to obscure a heavy particle 
decay. 



III. QCD JETS 

The LHC will be the first collider where jet masses play 
a serious role in analyses. The proton-proton center of 
mass energy at the LHC is sufficiently large that the mass 
spectrum of QCD jets will extend far into the regime 
of heavy particle production {mw and above). Because 
masses are such an important variable in jet substructure, 
masses of QCD jets will play an essential role in deter- 
mining the effectiveness of jet substructure techniques at 
separating QCD jets from jets with new physics. We ex- 
pect that the jet mass distribution in QCD is smoothly 
falling due to the lack of any intrinsic mass scale above 
AqcDj while jets containing heavy particles are expected 
to exhibit enhancements in a relatively narrow jet mass 
range (given by the particle's width, detector effects, and 
the systematics of the algorithm). 

Understanding the more detailed substructure of QCD 
jets (beyond the mass of the jet) presents an interesting 
challenge. QCD jets are typically characterized by the 
soft and coUinear kinematic regimes that dominate their 
evolution, but QCD populates the entire phase space of 
allowed kinematics. Due to its immense cross section 
relative to other processes, small effects in QCD can pro- 
duce event rates that still dominate other signals, even 
after cuts. Furthermore, the full kinematic distributions 
in QCD jet substructure currently can only be approxi- 
mately calculated, so we focus on understanding the key 
features of QCD jets and the systematic effects that arise 
from the algorithms that define them. Note that even 
when an on-shell heavy particle is present in a jet, the 
corresponding kinematic decay(s) will contribute to only 
a few of the branchings within the jet. QCD will still be 
responsible for bulk of the complexity in the jet substruc- 
ture, which is produced as the colored partons shower and 
hadronize, leading to the high multiplicity of color singlet 
particles observed in the detector. 

It is a complex question to ask whether the jet sub- 
structure is accurately reconstructing the parton shower. 
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and somewhat misguided, as the parton shower repre- 
sents colored particles while the experimental algorithm 
only deals with color singlets. A more sensible question, 
and an answerable one, is to ask whether the algorithm 
is faithful to the dynamics of the parton shower. This is 
the basis of the metrics of the kx and CA recombination 
algorithms — the ordering of recombinations captures 
the dominant kinematic features of branchings within the 
shower. In particular, the cross section for an extra real 
emission in the parton shower contains both a soft (z) 
and a collinear (Ai?) singularity: 



d(T„ 



da, 



dz dAR 
T AR ■ 



(10) 



While these singularities are regulated (in perturbation 
theory) by virtual corrections, the enhancement remains, 
and we expect emissions in the QCD parton shower to 
be dominantly soft and/or collinear. Due to their differ- 
ent metrics, the kx and CA algorithms will recombine 
these emissions differently, producing distinct substruc- 
ture. We will discuss the interplay between the dynamics 
of QCD and the recombination algorithms in the next 
two subsections. In the first, we will consider a sim- 
ple leading-logarithm (LL) approximation to perturba- 
tive QCD jets with just a single branching and zero-mass 
subjets. This will illustrate the simplest kinematics of 
Section II B coupled with soft /collinear dynamics. In the 
second subsection we consider the properties of the more 
realistic QCD jets found in fully simulated events. 



A. Jets in a Toy QCD 

To establish an intuitive level of understanding of jet 
substructure in QCD we consider a toy model descrip- 
tion of jets in terms of a single branching and the vari- 
ables xj, z, and Ai?i2. We take the jet to have a fixed 
PTj ■ We combine the leading-logarithmic dynamics of of 
Eq. (110 1 with the approximate expression for the jet mass 
in Eq. (9|, and we label this combined approximation as 
the "LL" approximation. Recall that this approximation 
for the jet mass is useful for small subjet masses and small 



opening angles. From Section II B recall that fixing xj 
provides lower bounds on both z and Ait!i2 and ensures 
finite results for the LL approximation. This approach 
leads to the following simple form for the xj distribution. 



1 dcTLL 



1 da 



LL 



a d{m?j / ^) a dxj 
<-^'^ dz dARi2 



AR 



12 



6{xj - z(l - z)ARi2) 



^l-Axj/D^ 



e[DyA-xj] . (11) 



Note we are integrating over the phase space of Fig. [2a] 
treating it as one-dimensional. The resulting distribution 



is exhibited in Fig. |4]for D = 1.0 where we have multi- 
plied by a factor of to remove the explicit pole. We 
observe both the cutoff at xj = D^/A arising from the 
kinematics discussed in Section IIB| and the — ln(a;j)/xj 
small- a; J behavior arising from the singular soft/coUinear 
dynamics. Even if the infrared singularity is regulated by 
virtual emissions and the distribution is resummed, we 
still expect QCD jet mass distributions (with fixed PTj) 
to be peaked at small mass values and be rapidly cutoff 
for mj > ptjD/2. 




FIG. 4: Distribution in xj in simple LL toy model with 
D ^ 1.0. 



We can improve this approximation somewhat by using 
the more quantitative perturbative analysis described in 
[1]. In perturbation theory jet masses appear at next-to- 
leading order (NLO) in the overall jet process where two 
(massless) partons can be present in a single jet. Strictly, 
the jet mass is then being evaluated at leading order (i.e., 
the jet mass vanishes with only one parton in a jet) and 
one would prefer a NNLO result to understand scale de- 
pendence (we take /i = ptj/2). Here we will simply use 
the available NLO tools [23j. This approach leads to the 
very similar xj distribution displayed in Fig. [5] plotted 
for two values of ptj (at the LHC, with y/s — 14 TeV). 
We are correctly including the full NLO matrix element 
(not simply the singular parts), the full kinematics of the 
jet mass (not just the small-angle approximation) and the 
effects of the parton distribution functions. In this case 
the distribution is normalized by dividing by the Born jet 
cross section. Again we see the dominant impact of the 
soft/coUinear singularities for small jet masses. Note also 
that there is little residual dependence on the value of 
the jet momentum (the distribution approximately scales 
with ptj) and that again the distribution essentially van- 
ishes for xj > 0.25, mj/pTj > 0.5 « The average 
jet mass suggested by these results is {mj/pTj) w 0.2D. 



^ The fact that the xj distribution extends a little past arises 
from the fact that the true {z, AR\2) phase space is really two- 
dimensional and there is still a small allowed phase space region 
below A_Ri2 = D even when xj = _D^/4. 
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FIG. 5: NLO distribution in xj for kx-style QCD jets 
with D = 1.0 and ^/s = 14 TeV and two values of pxj- 



However, because the jet only contains two partons at 
NLO, we are still ignoring the effects of the nonzero sub- 
jet masses and the effects of the ordering of mergings im- 
posed by the algorithm itself. For example, at this order 
there is no difference between the CA and kx algorithms. 

Next we consider the z and Ai?i2 distributions for the 
LL approximation where a single recombination of two 
(massless) partons is required to reconstruct as a jet of 
definite pt, and mass (fixed xj). To that end we can 



"undo" one of the integrals in Eq. (Ill and consider the 
distributions for z and AR12 . We find for the z distri- 
bution the form 



1 da 



LL 



l-^l~Ax.,/D^ 





'1 


e 


z 

2 







a dxjdz 2zx, 

(12) 

As expected, we see the poles in z and xj from the 
soft/coUinear dynamics, but, as in Section IIB the con- 
straint of fixed xj yields a lower limit for z. Recall that 
the upper limit for z arises from its definition, again ap- 
plied in the small-angle limit. Thus the LL QCD distri- 
bution in z is peaked at the lower limit but the charac- 
teristic turn-on point is fixed by the kinematics, requir- 
ing the branching at fixed xj to be in a jet of size D. 
This behavior is illustrated in Fig. [6] for various values of 
x; = 1/(7^ — 1) corresponding to those used in Section 

EH 

The expression for the Ai?i2 dependence in the LL 
approximation is 



1 daLL 



a dxjd^Ryi 

2 e[Ai?i 



(13) 



2y57]e[i?- Ai?i2] 



^^12 ^Ai?22 - 4a; J ( 1 - yj\ - \xjII\R\ 



This distribution is illustrated in Fig. |7]for the same val- 
ues of as in Fig. [6] As with the z distribution the kine- 
matic constraint of being a jet with a definite xj yields a 
lower limit, Ai?i2 > 2yjlirj, along with the expected up- 
per limit, Ai?i2 < D. However, for Ai?i2 the change of 



FIG. 6: Distribution in z for LL QCD jets for D = 1.0 
and various values of a;j. The curves are normalized to 
have unit area. 



variables also introduces an (integrable) square root sin- 
gularity at the lower limit. This square root factor tends 
to be numerically more important than the 1 /A_R^2 
tor. (One factor of A_Ri2 arises from the coUinear QCD 
dynamics while the other comes from change of variables. 
The soft QCD singularity is contained in the denomina- 
tor factor ( 1 — ^JY 



ixj/ARl^j 2z for xj < AR-^ 

(equivalently, z <^ 1).) Since this square root singularity 
arises from the choice of variable (a kinematic effect), we 
will see that it is also present for heavy particle decays, 
suggesting that the Ai?i2 variable will not be as useful as 
z in distinguishing QCD jets from heavy particle decay 
jets. 
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1.0 and various values of xj. The curves are 
normalized to have unit area. 



Thus, in our toy QCD model with a single recombi- 
nation, leading-logarithm dynamics and the small-angle 
jet mass definition, the constraints due to fixing xj tend 
to dominate the behavior of the z and A_Ri2 distribu- 
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tions, with limited dependence on the QCD dynamics 
and no distinction between the CA and kx algorithms. 
However, this situation changes dramatically when we 
consider more realistic jets with full showering, a subject 
to which we now turn. 



B. Jet Substructure in Simulated QCD events 

To obtain a more realistic understanding of the proper- 
ties of QCD jet masses we now consider jet substructure 
that arises in more fully simulated events. In particu- 
lar, we focus on Monte Carlo simulated QCD jets with 
transverse momenta in the range ptj = 500-700 GeV 
(c = 1 throughout this paper) found in matched QCD 
multijet samples created as described in Appendix [X] 
The matching process means that we are including, to a 
good approximation, the full NLO perturbative probabil- 
ity for energetic, large-angle emissions in the simulated 
showers, and not just the soft and coUinear terms. As 
suggested earlier, we anticipate two important changes 
from the previous discussion. First, the showering en- 
sures that the daughter subjets at the last recombination 
have nonzero masses. More importantly and as noted in 



Section II C the sequence of recombinations generated 
by the jet algorithm tends to force the final recombina- 
tion into a particular region of phase space that depends 
on the recombination metric of the algorithm. For the 
CA algorithm this means that the final recombination 
will tend to have a value of Ai?i2 near the limit D, while 
the kx algorithm will have a large value of zARi2PTj- 
This issue will play an important role in explaining the 
observed z and Ai?i2 distributions. 

First, consider the jet mass distributions from the sim- 
ulated event samples. In Fig. |8] we plot the jet mass dis- 
tributions for the kx and CA algorithms for all jets in the 
stated pt bin (500-700 GeV). As expected, for both algo- 
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400 



FIG. 8: Distribution in mj for QCD jets with px 
between 500 and 700 GeV with D = 1.0. 

rithms the QCD jet mass distribution smoothly falls from 
a peak only slightly displaced from zero (the remnant of 
the perturbative — ln(m^)/m^ behavior). There is a more 



rapid cutoff for mj > pTjD/2, which corresponds to the 
expected kinematic cutoff of mj = pTjD/2 from the LL 
approximation, but smeared by the nonzero width of the 
Pt bin, the nonzero subjet masses and the other small 
corrections to the LL approximation. The average jet 
mass, (mj) ~ 100 GeV, is in crude agreement with the 
perturbative expectation {mj/px,) ~ 0.2. Note that the 
two algorithms now differ somewhat in that the kx algo- 
rithm displays a slightly larger tail at high masses. As 
we will see in more detail below, this distinction arises 
from the difference in the metrics leading to recombining 
protojets over a slightly larger angular range in the kx 
algorithm. On the other hand, the two curves are re- 
markably similar. Note that we have used a logarithmic 
scale to ensure that the difference is apparent. Without 
the enhanced number of energetic, large-angle emissions 
characteristic of this matched sample, the distinction be- 
tween the two algorithms is much smaller, i.e., a typical 
dijet, LO Monte Carlo sample yields more similar distri- 
butions for the two algorithms. 

Other details of the QCD jet substructure are substan- 
tially more sensitive to the specific algorithm than the jet 
mass distribution. To illustrate this point we will discuss 
the distributions of z, Ai?i2, and the subjet masses for 
the last recombination in the jet. We can understand the 
observed behavior by combining a simple picture of the 
geometry of the jet with the constraints induced on the 
phase space for a recombination from the jet algorithm. 
In particular, recall that the ordering of recombinations 
defined by the jet algorithm imposes relevant boundaries 
on the phase space available to the late recombinations 
(sec Fig.|3]). 

While the details of how the kx and CA algorithms 
recombine protojets within a jet are different, the overall 
structure of a large-pT jet is set by the shower dynamics 
of QCD, i.e., the dominance of soft /coUinear emissions. 
Typically the jet has one (or a few) hard core(s), where 
a hard core is a localized region in with large en- 
ergy deposition. The core is surrounded by regions with 
substantially smaller energy depositions arising from the 
radiation emitted by the energetic particles in the core 
(i.e., the shower), which tend to dominate the area of the 
jet. In particular, the periphery of the jet is occupied 
primarily by the particles from soft radiation, since even 
a wide-angle hard parton will radiate soft gluons in its 
vicinity. This simple picture leads to very different re- 
combinations with the kx and CA algorithms, especially 
the last recombinations. 

The CA algorithm orders recombinations only by an- 
gle and ignores the px of the protojets. This implies 
that the protojets still available for the last recombina- 
tion steps are those at large angle with respect to the 
core of the jet. Because the core of the jet carries large 
Pt, as the recombinations proceed the directions of the 
protojets in the core do not change significantly. Until 
the final steps, the recombinations involving the soft, pe- 
ripheral protojets tend to occur only locally in y-cj) and 
do not involve the large-p^ protojets in the core of the 
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jet. Therefore, the last recombinations defined by the CA 
algorithm are expected to involve two very different pro- 
tojets. Typically one has large px, carrying most of the 
four-momentum of the jet, while the other has small pt 
and is located at the periphery of the jet. As we illustrate 
below, the last recombination will tend to exhibit large 
Ai?i2, small z, large ai (near 1), and small 02, where the 
last two points follow from the small z and correspond 



to the (z, Ai?i2) phase space of Fig. 2c 

In contrast, the kx algorithm orders recombinations 
according to both pt and angle. Thus the kx algorithm 
tends to recombine the soft protojets on the periphery 
of the jet earlier than with the CA algorithm. At the 
same time, the reduced dependence on the angle in the 
recombination metric implies the angle between protojets 
for the final recombinations will be lower for kx than CA. 
While there is still a tendency for the last recombination 
in the kx algorithm to involve a soft protojet with the 
core protojet, the soft protojet tends to be not as soft as 
with the CA algorithm (i.e., the z value is larger), while 
the angular separation is smaller. Since this final soft 
protojet in the kx algorithm has participated in more 
previous recombinations than in the CA case, we expect 
the average 02 value to be farther from zero and the oi 
value to be farther from 1. Generally the (z, Ai?i2) phase 
space for the final kx recombination is expected to be 
more like that illustrated in Figs. 2b and 2d (coupled 



with the boundary in Fig. 3b I 



To summarize and illustrate this discussion, we have 
plotted distributions of z, Ai?i2, and Oi for the last re- 
combination in a jet for the kx and CA algorithms in 
Figs. |9]^a-f ) for the matched QCD sample described pre- 
viously. We plot distributions with and without a cut 
on the jet mass, where the cut is a narrow window (« 
15 GeV) around the top quark mass. This cut selects 
heavy QCD jets, and for the pT window of 500-700 GeV 
it corresponds to a cut on xj of 0.06-0.12. These distri- 
butions refiect the combined infiuence of the QCD shower 
dynamics, the restricted kinematics from being in a jet, 
and the algorithm-dependent ordering effects discussed 
above. Most importantly, note the very strong enhance- 
ment at the smallest values of z for the CA algorithm in 



Fig. 9a which persists even after the heavy jet mass cut. 
Note there is a log scale in Fig. |9a| to make the differ- 
ences between the distributions clearer and better show 
the dynamic range. While the kx result in Fig.|9b]is still 
peaked near zero when summed over all jet masses, the 
enhancement is not nearly as strong. After the heavy 
jet mass cut is applied, the distribution shifts to larger 
values of z, with an enhancement remaining at small val- 
ues. Only in this last plot is there evidence of the lower 
limit on z of order 0.1 expected from the earlier LL ap- 
proximation results. Note also that the z distributions 
all extend slightly past z = 0.5, indicating another small 
correction to the LL approximation arising the the true 
two-dimensional nature of the (z, Ai?i2) phase space. 
Fig. [9c] illustrates the expected enhancement 
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FIG. 9: Distribution in z, Ai?i2, and the scaled 
(heavier) daughter mass ai for QCD jets, using the CA 
and kx algorithms, with (dashed) and without (solid) a 

cut around the top quark mass. The jets have pt 
between 500 and 700 GeV with D = 1.0. Note the log 
scale for the z distribution of CA jets. 



a much broader distribution than CA with an enhance- 
ment for small Ai?i2 values. Once the heavy jet mass cut 
is applied, both algorithms exhibit the lower kinematic 
cutoff on Ai?i2 suggested in the LL approximation re- 
sults, as both distributions shift to larger values of the 
angle. This shift serves to enhance the CA peak at the 
upper limit and moves the the lower end enhancement in 
kx to substantially larger values of Ai?i2 . 

The CA algorithm bias toward large oi is demon- 
strated in Fig. [9e] We can see that requiring a heavy 
jet enhances the large-ai peak and also results in a much 
smaller enhancement around ai « 0.2. The kx distribu- 
tion in oi, shown in Fig. [9^ exhibits a broad enhance- 
ment around ai « 0.4. This distribution is relatively 
unchanged after the jet mass cut. To give some insight 
into the correlations between z and A_Ri2, in Fig. |10| we 
plot the distribution of both variables simultaneously for 
both algorithms, with no jet mass cut applied. The very 
strong enhancement at small z and large Ai?i2 for CA is 
evident in this plot. For kx, there is still an enhancement 
at small z and large Ai?i2, but there is support over the 
whole range in z and Ai?i2 with the impact of the shap- 
ing due to the z x Ai?i2 dependence in the metric clearly 
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FIG. 10: Combined distribution in z and Ai?i2 for 
QCD jets, using the CA (left) and kx (right) 
algorithms, for jets with pT between 500 and 700 GeV 
with D = 1.0. Each bin represent a relative density in 
each bin, normalized to 1 for the largest bin. 



evident. Note that the kx distribution is closer to what 
one would expect from QCD alone, with enhancements at 
both small z and small Ai?i2, while the CA distribution 
is asymmetrically shaped away from the QCD-like result. 
Finally we should recall, as indicated by Fig. |8] that the 
jets found by the two algorithms tend to be slightly dif- 
ferent, with the kx algorithm recombining slightly more 
of the original (typically soft) protojets at the periphery 
and leading to slightly larger jet masses. 

Because the QCD shower is present in all jets, and is 
responsible for the complexity in the jet substructure, the 
systematic effects discussed above will be present in all 
jets. While the kinematics of a heavy particle decay is 
distinct from QCD in certain respects, we will find that 
these effects still present themselves in jets containing 
the decay of a heavy particle. This reduces our ability to 
identify jets containing a heavy particle, and will lead us 
to propose a technique to reduce them. In the following 
section, we study the kinematics of heavy particle decays 
and discuss where these systematic effects arise. 



IV. RECONSTRUCTING HEAVY PARTICLES 

Recombination algorithms have the potential to recon- 
struct the decay of a heavy particle. Ideally, the substruc- 
ture of a jet may be used to identify jets coming from a 
decay and reject the QCD background to those jets. In 
this section, we investigate a pair of unpolarized parton- 
level decays, a heavy particle decaying into two massless 
quarks (a 1 ^ 2 decay) and a top quark decay into three 
massless quarks (a two-step decay). For each decay, we 
study the available phase space in terms of the lab frame 
variables Ai?i2 and z and the shaping of kinematic dis- 
tributions imposed by the requirement that the decay 
be reconstructed in a single jet. We will determine the 
kinematic regime where decays are reconstructed, and 
contrast this with the kinematics for a 1 — s- 2 splitting in 
QCD. 



A. 1^2 Decays 

We begin by considering a 1 ^ 2 decay with mass- 
less daughters. An unpolarized decay has a simple phase 
space in terms of the rest frame variables cos^o and (^q: 



d^No 



dcos 9qc 



1 

47r' 



(14) 



Recall from Sec. |IIB| that cos^o and (po are the polar and 
azimuthal angles of the heavier daughter particle (when 
the daughters are identical, we can take these to be the 
angles for a randomly selected daughter of the pair) in 
the parent particle rest frame relative to the direction of 
the boost to the lab frame. In general, we will use A^o 
to label the distribution of all decays, while N will label 
the distribution of decays reconstructed inside a single 
jet. Nq is normalized to unity, so that for any variable 
set <&, 



d<i> 



dNo 



1. 



(15) 



The distribution N is defined from Nq by selecting those 
decays that fit in a single jet, so that generically 



dN 



d<i>'^S{^' 



<i>)0(single jet reconstruction). 

(16) 

N is naturally normalized to the total fraction of recon- 
structed decays. The constraints of single jet reconstruc- 
tion will depend on the decay and on the jet algorithm 
used, and abstractly take the form of a set of Q functions 
specifying the ordering and limits on recombinations. For 
a 1 ^ 2 decay and a recombination- type algorithm, the 
only constraint is that the daughters must be separated 
by an angle less than D: 



Ai?i2 < D. 



(17) 



Since the kinematic limits imposed by reconstruction are 
sensitive to the boost 7 of the parent particle, we will 
want to consider the quantities of interest at a variety 
7 values. To illustrate this 7 dependence, we first find 
the total fraction of all decays that are reconstructed in 
a single jet for a given value of the boost. We call this 
fraction 7^(7) : 

dcosdod^bo-. (D - Ai?i2) . (18) 

dcos t/Qdfpo 



In 
D 



Fig. 
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we plot fR{"f) vs. 7 for several values of 
The reconstruction fraction rapidly rises from no re- 
construction to nearly complete reconstruction in a very 
narrow range in 7. This indicates that Ai?i2 is highly 
dependent on 7 for fixed cos and 4>o , which we will 
see below. Furthermore, the cutoff where /h(7) = is 
very sensitive to the value of D, with very large boosts 
required to reconstruct a particle in a single jet except 
for larger values of D. This turn-on for increasing 7 is 
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FIG. 11: Reconstruction fractions fnij) as a function 
of 7 for various D. 



the same effect as the (z, Ai?i2) phase space moving into 
the allowed region below Ai?i2 = Z? in Fig. [2a| as xj is 
reduced. 

To better understand the effect that reconstruction has 
on the phase space for decays, we would like to find the 
distribution of 1 — > 2 decays in terms of lab frame vari- 
ables, 



dzdARi2 ' 



(19) 



With two massless daughters, Ai?i2 is given in terms of 
rest frame variables by 



tanh 



tan 



27 sin 6q sin < 



OoiP^l^ + sin'' 
2/37 sin 6*0 cos < 



sin^ 9o{f3^-f^ 



sm 



.(20) 



with (3 = a/1 — 7"^. This relation is analytically non- 
invertible, meaning we cannot write the Jacobian for the 
transformation 



d^No 



d cos 9, 



oatpo 



d^N„ 
dzdARi2 



(21) 



in closed form. However, Ai?i2 has some simple limits. 
In particular, when the boost 7 is large, to leading order 
in 7-1, 



Ai?i2 = 



7 sm Wo 



(22) 



This limit is only valid for sin^o ?J 7^^5 but as we will 
see this is the region of phase space where the decay will 
be reconstructed in a single jet. The large-boost approx- 
imation describes the key features of the kinematics and 
is useful for a simple picture of kinematic distributions 
when particles are reconstructed in a single jet. 

Since 7 = -^/l -I- 1/xj, this li mit is e quivalent to the 
small-angle limit we took in Sec. Ill A (For AR^ ^ 1, 



xj « z{l - z)AR^ < 1.) We can see this in Eq. ( |20| , 
where AR k, I/7. 

The value of z is also simple in the large-boost approx- 
imation. In this limit. 



1 



I cos 6*01 



+ 0(7-^) 



(23) 



With the large-boost approximation, z and Ai?i2 are 
both independent of c/jq. As noted earlier both Ai?i2 
and z depend on (f>Q only through terms that are sup- 
pressed by inverse powers of 7 (cf. Figs. [l]and|2|, and 
taking the large-boost limit eliminates this dependence. 
Therefore, in this limit we can integrate out (/>o and find 
the distributions in z and Ai?i2 for all decays. For z the 
distribution is simply flat: 



dz 



26 1 --z)e(z). 



(24) 



We have included the limits for clarity. For Ai?i2, the 
distribution is 



dNo _ 4 e (Ai?i2 - 27-1) 
dARi2 ^ i^ARl^ yAi?22-47-2 ' 



(25) 



This distribution has a lower cutoff requiring A_Ri2 > 
27^^. This is close to the true lower limit on Ai?i2, 
which comes from setting (/)o = in the exact formula for 
Ai?i2 and simplifying. The exact lower limit is 



Ai?i2 > 2csc~^7, 



(26) 



which is within 5% of 27 ^ for values of 7 for which 
fnil) > 0- Note that in Eq. (251, there is a enhance- 



ment at the lower cutoff in Ai?i2 due to the square root 
singularity arising from the change of variables, just as 
there was in the QCD result in Eq. (14 1. Thus the dis- 



tribution in Ai?i2 is highly localized at the cutoff, which 
is a function of 7. 
In Fig. 



12 we plot the true distribution dNo/dz, found 



numerically using no large-boost approximation, for sev- 
eral values of 7. Qualitatively, the true distribution is 
very similar to the approximate one in Eq. ( |24[ ) , which is 
flat. The peak in the distribution at small z values comes 
from the reduced phase space as z ^ 0, and the peak is 
lower for larger boosts. Likewise, the exact distribution 
dNo/dARi2 is very similar to the large-boost result; in 
Fig. |13[ we plot dNo/dARi2 with no approximation. The 
distribution in Ai?i2 is localized at the lower limit, es- 
pecially for larger boosts. This provides a useful rule: 
the opening angle of a decay is highly correlated with 
the transverse boost of the parent particle. Note that 
the relevant boost is the transverse one because the an- 
gular measure AR is invariant under longitudinal boosts 
(recall that in the example here, we have set the parent 
particle to be transverse). 

The constraint imposed by reconstruction is simple to 
interpret in the large-boost approximation. In terms 
of sin 00 1 the constraint Ai?i2 < D requires sin^o > 
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FIG. 12: The distribution of all decays in z for several 
values of 7. 
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FIG. 13: The distribution of all decays in AR12 for 
several values of 7. 



2/7D, which excludes the region where the approxima- 
tion breaks down. Therefore the large-boost approxi- 
mation is apt for describing the kinematics of a recon- 
structed decay. In Fig. [l4j we plot the distribution, 



dN /d cos 9o, where the implied sharp cutoff is apparent 
(and should be compared to what we observed in Fig, 



la 



This distribution is easy to understand in the rest frame 
of the decay. When | cos 9o \ is close to 1, one of the daugh- 
ters is nearly coUinear with the direction of the boost to 
the lab frame, and the other is nearly anti-coUinear. The 
anti-coUinear daughter is not sufficiently boosted to have 
Ai?i2 < D with the coUinear daughter, and the par- 
ent particle is not reconstructed. As jcos^ol decreases, 
the two daughters can be recombined in the same jet; 
this transition is rapid because the (j)o dependence of the 
kinematics is small. We now look at the distributions of 
z and Ai?i2 when we require reconstruction. 

Because z is linearly related to cos^o at large boosts. 




FIG. 14: The reconstructed distribution dN/dcos9o 
with D = l.O for various values of 7. 



the distribution in z has a simple form: 



dN 
dz 



26 z - 



yi- 4/(7^2^2) 



^ . . ^^^^ 

Comparing to Eq. ( |24[ ) , we see that requiring reconstruc- 
tion simply cuts out the region of phase space at small z. 
This is confirmed in the exact distribution dN /dz, shown 
in Fig.jTs] The small-z decays that are not reconstructed 
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FIG. 15: The distribution of reconstructed decays in z 
for several values of 7. 

come from the regions of phase space with | cos 6q \ near 
1, just as in the previous discussion. In these decays, the 
backwards-going (anti-coUinear) daughter in the parent 
rest frame is boosted to have small pt in the lab frame. 
Comparing to Fig. |6] the distribution in z for QCD split- 
tings, we see first that the cutoffs on the distributions are 
similar (they are not identical because of the LL approx- 
imation used in Fig. [6]). However, the QCD distribution 
has an enhancement at small z values, due to the QCD 
soft singularity, that the distribution for reconstructed 
decays does not exhibit. 

The distribution of reconstructed particles in the vari- 
able A_Ri2 is related simply to the distribution of all de- 



14 



cays in the same variable: 



following constraints on the partons: 



dN 
dAR 



12 



dNp 
dARi: 



(28) 



which means that the distribution dN/dARi2 is given by 
Fig. 13 with a cutoff at AR12 = D. Note that this distri- 
bution is very close in shape to the distribution of QCD 
branchings versus Ai?i2 displayed in Eq. ( 14 1 and Fig. [t] 
This similarity arises from that the fact that the most 
important factor in the shape is the square root singu- 
larity, which arises from the change of variables in both 
cases and is not indicative of the underlying differences 
in dynamics. 

In this subsection, we have considered 1 — > 2 decays 
with massless daughters and a fixed boost and the shap- 
ing effects that arise from requiring that the decay be 
reconstructed in a jet. We have found that decays share 
many kinematic features with QCD branchings into two 
massless partons at fixed xj. In particular, the cutoffs 
on distributions are set by the kinematics, and do not 
depend on the process. Comparing Eqs. (12 27 1 and 



Eqs. (14 28 1, we see that the upper and lower cutoffs are 



the same within the approximations used. The dominant 
feature in the Ai?i2 distribution, the square root singu- 
larity at the lower bound, is also a kinematic effect shared 
by both decays and QCD branchings. On the other hand, 
the z distributions are distinct. While QCD branchings 
are enhanced at small z, for decays the distribution in z 
is flat over the allowed range. 



B. Two-step Decays 

We now turn our attention to two-step decays, which 
exhibit a more complex substructure than a single 1 — > 2 
decay. Compared to one-step decays, two-step decays of- 
fer new insights into the ordering effects of the kx and 
CA algorithms, highlight the shaping effects from the al- 
gorithm on the jet substructure and offer a surrogate for 
the cascade decays that are often featured in new physics 
scenarios. The top quark is a good example of such a de- 
cay, and we focus on it in this section. Unlike a 1 ^ 2 
decay, in reconstructing a multi-step decay at the parton 
level the choice of jet algorithm matters; different algo- 
rithms can give different substructure. We take the same 
approach as for the 1^2 decay, studying the kinematics 
of the parton-level top quark decay in terms of the lab 
frame variables Ai?i2 and z. 

We will label the top quark decay t Wb, with 
W ^ qq' . In this discussion requiring that the top quark 
be reconstructed means that the W must be recombined 
from q and q' first, followed by the h. This recombination 
ordering reproduces the decay of the top, and the is a 
daughter subjet of the top quark. For the kx algorithm, 
reconstructing the top quark in a single jet imposes the 



I0ili{pTq-,PTq')ARqq, < mill{pTq, PTb)ARbq, 
inm{pTq,PTq')ARqq, < min{pTq' , PTb) ARbq' , 

ARqq- < D, and 
ARbw < D. 



(29) 



For the CA algorithm the relations are strictly in terms 
of the angle: 



ARqqi < ARbq, 
ARqqi < ARbq', 

ARqq' < D, and 
ARbw < D. 



(30) 



The kinematic limits requiring the decay to be recon- 
structed in a single jet are the same for the two algo- 
rithms, but fixing the ordering of the two recombinations 
requires a different restriction for each algorithm, which 
in turn biases the distributions of kinematic variables. 

The common requirement that the top quark be recon- 
structed in a single jet, ARqqi < D and ARy^b < D, is 
straightforward to understand in terms of the rest frame 
variable cos^Oi which here is the polar angle in the top 
quark rest frame between the W and the boost direc- 
tion to the lab frame. For cos^o ~ li the W has a large 
transverse boost in the lab frame, so ARqqi < D, but the 
angle between the W and b will be large (as was the case 
for the corresponding 1^2 decay in the previous sec- 
tion). For cos^o ~ ~li the W transverse boost is small, 
and ARqqi will be large. Therefore, we only expect to 
reconstruct top quarks in a single jet when | cos 9o\ is not 
near 1. Specifically which decays will be reconstructed, 
though, depends on the algorithm. 

If the CA algorithm correctly reconstructs the top 
quark, the two quarks from the W decay must be the 
closest pair (in AR) of the three final state particles. 
This requirement highly selects for decays where the W 
opening angle, ARqqi , is smaller than the top quark open- 
ing angle, ARwb- Therefore, only decays with a large 
(transverse) W boost will be reconstructed by the CA 
algorithm. In terms of cos^Oj the fraction of decays that 
are reconstructed will increase as we increase cos Oq to- 
wards the upper limit where ARyyb ^ D, and the recon- 
struction fraction will be small for lower values of cos 9q. 

The kx algorithm orders recombinations by pt as well 
as angle, and the set of reconstructed decays is under- 
stood most easily by contrasting with CA. As the trans- 
verse boost of the W decreases, on average the px of the q 
and q' decrease while the px of the b increases. Therefore, 
while ARqqi is increasing, •imii{pTq,PTq') is decreasing, 
and these competing effects suggest that kx reconstructs 
decays with smaller values of cos^o than CA, and that 
the dependence on cos 6*0 is not as strong. 

The effect of the CA and kx algorithms on the observed 
distribution in cos Oq is shown in Fig. |16| where we plot 
the distribution of cos 9q for reconstructed top quarks for 
both algorithms. The top boost is fixed to 7 = 3. We 
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FIG. 16: dN / dcosO^ vs. cos 6*0, with 7 = 3, for both the 
kx and CA algorithms. The underlying distribution 
dNo/dcosOo = 1/2 is plotted as the dotted line for 
reference. 



observe the kinematic limit near cos^o ~ 0-8 is common 
between algorithms, and that cos^o ~ — 1 is not accessed 
by either algorithm. As expected, the distribution for the 
CA algorithm falls off more sharply than for k-r at lower 
values of cos^o- 

Next, we look at distributions in z and ARwb- Just as 
in the 1 — > 2 decay, we expect decays with small z not to 
be correctly reconstructed. Small values of z will come 
when the or 6 is soft, and therefore produced very 
backwards-going in the top rest frame. This corresponds 
to cos 6*0 



±1, and from Fig. 16 these decays are not 
reconstructed. In Fig. [17) we plot the distribution in z for 
all decays, dN^/dz, and the distribution for reconstructed 
decays, dN/dz, for a boost of 7 = 3. 




FIG. 17: dNo/dz (all decays) and dN/dz (reconstructed 
decays), with 7 = 3. 

In dNo/dz, the discontinuity at z « 0.2 arises from 
the fact that the W is sometimes softer than the b, but 
has a minimum px- The extra weight in dNo/dz for z 
above this value comes from the decays where the W is 
softer than the b. Note that these decays are rarely re- 
constructed, especially for CA: the distribution dN/dz 
is smooth, and has little additional support in the re- 



gion where the W is softer. This correlates with the fact 
that decays with negative cos values are rarely recon- 
structed with CA, but more frequently with kx. The 
distribution dN/dz has a lower cutoff that corresponds 
to the upper cutoff in Fig. 
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As the boost 7 of the top 
increases, the cutoff at small z decreases, since the limit 
in cos 60 for which AR\nri, > D will increase towards 1 . 

The opening angle ARwb of the top quark decay also 
illustrates how strongly the kinematics are shaped by the 
jet algorithm. When cos^o ~ —1, for sufficient boosts 
AR\Yb is small because the W is boosted forward in the 
lab frame, but these decays are not reconstructed be- 
cause the ordering of recombinations will typically be in- 
correct and the W decay may not within ARqqi < D. 
For cos(?o ~ 1, ARwb will exceed D and the top will 
not be reconstructed. In Fig. [18] we plot the distribution 
dNo/dARwb of the angle between the W and b in all top 
decays for a top boost of 7 = 3, as well as the distribution 
dN / dARi2 of the angle of the last recombination for re- 
constructed top quarks with the kx and CA algorithms. 
Note that when the top quark is reconstructed at the par- 
ton level, AR12 = ARwb- The difference in dN/dARi2 
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FIG. 18: dNo/dARwb (all decays) and dN/dARi2 
(reconstructed decays), with 7 = 3. 

between the kx and CA algorithms reflects their differ- 
ent recombination orderings. Because CA orders strictly 
by angle, the angle Ai?i2 tends to be larger than for kx 
because CA requires Ai?i2 — ARwb > ARqqi . The kx 
algorithm prefers smaller angles for ARwb, because in 
these cases the W is softer so that the value of the kx 
metric to recombine the q and g', rD.YB.{pTq,PTq') ARqqi , 
is smaller. 



C. Hadron-level Decays 

To this point, we have looked at parton- level kinemat- 
ics of the top decay. However, we cannot expect the jet 
algorithm to faithfully represent the kinematics of the 
parton-level top decay in jets which include the physics 
of showering and hadronization. That is, the systematic 
effects of the jet algorithm, similar to those seen in QCD 
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jets in Section III can be expected to appear in distribu- 
tions of kinematic variables for jets reconstructing the top 
quark mass. The substructure of a jet that reconstructs 
the top quark mass may not match onto the kinematics of 
that decay due to systematic effects of the jet algorithm. 
For instance, in the CA algorithm we expect that soft 
recombinations will occur at the last recombination step, 
even for jets that contain the decay products of a top 
quark. This can make the substructure look more like a 
heavy QCD jet than a top quark decay, and subsequently 
the jet may not be properly identified. 

To demonstrate this point, in Fig. [19] we plot the dis- 
tribution in z for jets with mass within a window around 
the top quark mass. The data represent simulated ti 
events as described in Appendix |X] In this sample, the 
top quarks have apT between 500-700 GeV, so that many 
are expected to be reconstructed in a single jet. 
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FIG. 19: Distribution in z for jets with the top mass in 
the tt sample. The jets have pt between 500 and 700 
GeV, and D = 1.0. Note the kx distribution is scaled 
up by a factor of 5 to make the scales comparable. 

The distribution for CA jets is very different from the 



parton-level distribution, plotted in Fig. 17 The excess 
at small values of z arises from soft recombinations in the 
CA algorithm, which make the distribution similar to the 
distribution in z from QCD jets shown in Figs. |9a| and 



9b For the kx algorithm, there are rarely soft recombi- 
nations late in the algorithm, because the metric orders 
according to z as well as Ai?. However, the kx algorithm 
tends to have a much broader mass distribution for recon- 
structed tops than the CA algorithm, since soft particles 
that dominate the periphery of the jet are recombined 
early in the algorithm. This means that soft energy de- 
positions in the calorimeter near the decay products of a 
top quark have a higher probability of being included in 
the jet and broadening the reconstructed top mass dis- 
tribution. In Fig. |20| we plot the jet mass distribution in 
the neighborhood of the top mass for jets in the same it 



sample as in Fig. 19 for both algorithms. 

The top mass peak is broadened for the kx algorithm 
relative to CA. From the point of view of jet substructure, 
we cannot identify vertex-specific variables (such as z and 
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FIG. 20: Distribution in jet mass for jets in the 
neighborhood of the top mass in tt events for the CA 
(black) and kx (red) algorithms. 



Ai?) that characterize this broadening, because it is due 
to recombinations early in the algorithm. However, we 
will find that techniques used to remove the systematic 
effects of the algorithm from the substructure of jets are 
effective in narrowing mass distributions. 



V. IDENTIFYING RECONSTRUCTED HEAVY 
PARTICLES WITH JET SUBSTRUCTURE 

In the previous two sections we examined several kine- 
matic distributions for QCD splittings and for heavy 
particle decays. These studies fall into two categories: 
parton-level, dealing with the fundamental 1^2 pro- 
cesses, and hadron- level, including the physics of shower- 
ing and hadronization. While the parton-level studies are 
important to understand the kinematics of reconstructed 
decays and the differences from QCD, the hadron- level 
studies encompass the effects of the QCD shower and the 
jet algorithm. We will explore these effects more in this 
section, and give a more complete picture of jet substruc- 
ture. Since our focus is on reconstructing heavy particles, 
we will discuss the difficulties that arise in interpreting 
jet substructure. 

Our parton-level studies can be briefly summarized. In 
Sec. |III| we used a toy model for QCD splittings in jets 
that contained the dominant soft and coUinear physics 
of QCD, and studied the kinematics for fixed m/pT of 
the parent parton in the splitting. In Sec. |IV| we looked 
at 1 — > 2 and 1 — > 3, two-step decays with fixed boost, 
requiring that the decay be reconstructed in a jet. For 
the two-step top quark decay, requiring full reconstruc- 
tion of the top (including the W as a, subjet) from the 
three final state quarks imposed kinematic restrictions 
that depended on the algorithm used. These studies led 
to the z and Ai?i2 distributions seen in Figs. 

and 



eland [7] 



(QCD),[T5|and 13 (one-step decays), and 17 and 18 (two- 
step decays). We can see that the distributions in Ai?i2 
are quite similar, but that QCD splittings tend to have 
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smaller z values than heavy particle decays. However, 
the kinematics of a heavy particle decay are not always 
simple to detect in a jet that includes showering, as our 
hadron- level studies have demonstrated. 

The QCD shower and the jet algorithm both play a 
significant role in shaping the jet substructure. The or- 
dering of recombinations for the kx and CA algorithms 
imposes significant kinematic constraints on the phase 
space for the last recombinations in a jet. This leads to 
kinematic distributions for the last recombination in a jet 
that depend as much on the algorithm as the underlying 
physics of the jet. For instance, in Figs. [9af|9f| we find 
that the kinematics of the last recombination in QCD 
jets is very different between the kx and CA algorithms. 
In particular, we can compare Figs. [9a| and |9b[ the distri- 
bution in z of the last recombination for QCD jets, with 
Fig. 19 the distribution in z of the last recombination 



for jets in a tt sample that reconstruct the top quark 
mass. For the kx algorithm, the differences reflect the 
different physics of QCD splittings and decays. However, 
the CA algorithm has shaped the distributions to have 
a large enhancement at small z for both processes. This 
implies that it is difficult to discern the physics of the 
jet simply from the value of z in the last recombination 
for CA. For the kx algorithm, because of the ordering of 
recombinations, the final recombinations better discrim- 
inate between decays and QCD, but the mass resolution 
is poorer than for CA. In Fig. 20 we see that the mass 
distribution of a reconstructed top quark is degraded for 
the kx algorithm relative to CA. 

There is one more important contribution to jet sub- 
structure common to QCD jets and heavy particle decays 
that we have not yet discussed. This is the combined ef- 
fect of splash-in from several sources: soft radiation from 
other parts of the hard scattering, from the underlying 
event (UE), i.e., from the rest of the individual "pp scat- 
tering, and from pile- up, i.e., from other collisions 
that occur in the same time bin. All of these sources 
add particles to jets that are typically soft and approx- 
imately uncorrelated. Splash-in particles will mostly be 
located at large angle to the jet core, simply because 
there is more area there. How these particles affect jet 
substructure depends on the algorithm used, and we ex- 
pect them to contribute similarly to soft radiation from 
the QCD shower, discussed at the ends of Sees. |III| and 
|IV[ For concreteness, we now examine briefly the effect 
of adding an UE to our Monte Carlo events. We expect 
other splash-in effects to be similar. 

In Fig. [21] we show the effect of adding an UE on jet 
masses. The effect here is simple: adding extra energy 
to jets pushes the mass distribution higher. Note that 
for top jets, the mass peak has also broadened, making it 
harder to find the signal mass bump over the background 
distribution. In Fig. |22[ we show how distributions in z 
and Ai?i2 are affected by the UE. Due to the extra radi- 
ation at large angles from the UE, the distribution in the 
angle of the last recombination, Ai?i2, is systematically 
shifted to larger values. The UE populates the same re- 



gion in the jet as soft radiation from the hard partons, 
meaning the distribution in z is not significantly altered 
by the UE. 
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FIG. 21: Distribution in toj with and without 
underlying event for QCD and top jets, using the CA 
and kx algorithms. The jets have px between 500 and 
700 GeV, and D = 1.0. The samples are described 
further in Appendix [A| 
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FIG. 22: Distributions in z and Ai?i2 with and without 
underlying event for QCD and top jets, using the CA 
and kx algorithms. The jets have pt between 500 and 
700 GeV, and D = 1.0. The samples are described 
further in Appendix [A| 

We have seen numerous examples that the kinematics 
of the jet substructure in the last recombination for CA 
is a poor indicator for the physics of the jet. However, we 
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can characterize the aberrant substructure very simply. 
For the CA algorithm, late recombinations (necessarily 
at large AR) with small z are more likely to arise from 
systematics effects of the algorithm than from the dy- 
namics of the underlying physics in the jet. For the Ict 
algorithm, the poor mass resolution of the jet arises from 
earlier recombinations of soft protojets. The last recom- 
bination for kx is representative of the physics of the jet, 
but the degraded mass resolution makes it difficult to ef- 
ficiently discriminate between jets reconstructing heavy 
particle decays and QCD. While small-z, large-Ai? re- 
combinations are not as frequent late in the kx algorithm 
as in CA, they do contribute the most to the poor mass 
resolution of kx. 

As a simple example of the sensitivity of the mass to 
small-z, large-Ai? recombinations, consider the recombi- 
nation i, j — > p of two massless objects in the small-angle 
approximation. The mass of the parent p is given by 
rrip = Px^z{l — z)ARfj, as in Eq. Suppose the value 
of the kx recombination metric, pij(kx) = pr zARi2 is 
bounded below by a value pa (say by previous recom- 
binations), and the recombination i,j — > p occurs at 
Py(kx) = Pa- Then the mass of the parent is = 
Po(l — ^)/^! which is maximized for small z. Therefore, 
at a given stage of the algorithm, small-z recombinations 
have a large effect on the mass of the jet. 

When we can resolve the mass scales of a decay in a jet, 
the distribution of kinematic variables matches closely 
what we expect from the parton-level kinematics of the 
decay. For the example of the top quark decay, if we se- 
lect jets with the top mass that have a daughter subjet 
with the W mass, the kinematic distributions of z and 
Ai?i2 closely match the distributions from the parton- 
level decay of the top quark. We show this in Fig. [23] 
where we make a top quark "hadron-parton" compari- 
son for z and Ai?i2. In the hadron-level events, we take 
jets from ti production and either make a cut on the jet 
mass, requiring a mass near the top mass, or both the 
jet mass and the subjet mass, requiring proximity to the 
top and W masses. The specifics of the mass cuts are 
described in Sec. |VII[ In the parton-level events, we sim- 
ply require that the top quark decay to three partons 
be fully reconstructed by the algorithm in a single jet, 
namely that the W is correctly recombined first from its 
decay products before recombination with the b quark 
to make the top. The parton-level events have the same 
distribution of top quark boosts as the top jets in the 
hadron-level events. It is clear that simply requiring the 
hadron-level jet have the top mass, which makes no cut 
on the substructure, leads to kinematic distributions in 
z and Ai?i2 for CA that do not match the parton-level 
distributions, although the distributions do match quite 
well for the kx algorithm. The excess of small-z recom- 
binations for CA in the hadron-level jet with only a jet 
mass cut arises from jet algorithm effects discussed previ- 
ously. After the subjet mass cut, these are removed and 
the distribution of z in the jet matches the reconstructed 
parton-level decay very well. 
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FIG. 23: Distributions in z and Ai?i2 comparing for 
top quark decays at the parton-level and from Monte 
Carlo events. The jets have pt between 500 and 700 
GeV, and have D = 1.0. The parton-level top decays 
have the same distribution of boosts as the Monte Carlo 
top jets. Jets in the upper plots have a mass cut on the 
jet; the lower plots includes a subjet mass cut. The 
details of these cuts are described in Sec. 
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Therefore, when we can accurately reconstruct the 
mass scales of a decay in a jet, the kinematics of the 
jet substructure tend to reproduce the parton-level kine- 
matics of the decay. This suggests that if we can reduce 
systematic effects that generate misleading substructure, 
we can improve heavy particle identification and separa- 
tion from background. Reducing these systematic effects 
can also improve the mass resolution of the jet, which will 
aid in identifying a heavy particle decay reconstructed in 
a jet and in rejecting the QCD background. We now 
discuss a technique that aims to accomplish this goal. 



VI. THE PRUNING PROCEDURE 

In this section we define a technique that modifies the 
jet substructure to reduce the systematic effects that ob- 
scure heavy particle reconstruction. In general, we will 
think of a pruning procedure as using a criterion on kine- 
matic variables to determine whether or not a branching 
is likely to represent accurate reconstruction of a heavy 
particle decay. This takes the form of a cut: if a branch- 
ing does not pass a set of cuts on kinematic variables, 
that recombination is vetoed. This means that one of 
the two branches to be combined (determined by some 
test on the kinematics) is discarded and the recombina- 
tion does not occur. 

In Sec. |V] we identified recombinations that are un- 
likely to represent the reconstruction of a heavy particle. 
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These can be characterized in terms of the variables z 
and Ai?: recombinations with large Ai? and small z are 
much more likely to arise from systematic effects of the 
jet algorithm and in QCD jets rather than heavy parti- 
cle reconstruction. We expect that removing (pruning) 
these recombinations will tend to improve our ability to 
measure the mass of a jet reconstructing a heavy parti- 
cle. We also expect that this procedure will systemati- 
cally shift the QCD mass distribution lower, reducing the 
background in the signal mass window. Finally this pro- 
cedure is expected to reduce the impact of uncorrelated 
soft radiation from the underlying event and pile-up. We 
therefore define the following pruning procedure: 

0. Start with a jet found by any jet algorithm, and 
collect the objects (such as calorimeter towers) in 
the jet into a list L. Define parameters -Dcut and 
^cut for the pruning procedure. 

1. Rerun a jet algorithm on the list L, checking for the 
following condition in each recombination i,j p: 



_ mm{pTt,PTj) ^ 

Z — < ^cut 

PTp 



and ARij > Dcut- 



This algorithm must be a recombination algorithm 
such as the CA or kx algorithms, and should give a 
"useful" jet substructure (one where we can mean- 
ingfully interpret recombinations in terms of the 
physics of the jet). 

2. If the conditions in 1. are met, do not merge the 
two branches 1 and 2 into p. Instead, discard the 
softer branch, i.e., veto on the merging. Proceed 
with the algorithm. 

3. The resulting jet is the pruned jet, and can be com- 
pared with the jet found in Step 0. 

This technique is intended to be generically applica- 
ble in heavy particle searches. It generalizes analysis 
techniques suggested by other authors j9j [11], in that 
these methods also modify the jet substructure to assist 
separate a particular signal from backgrounds. We em- 
phasize that pruning can be broadly apphed. We have 
endeavored to justify this claim with the discussions in 
Secs. |IIIj|Vl which demonstrate that the interpretation of 
jet substructure is subject to systematic effects that can 
be well characterized. Pruning is not the only option, 
but offers some advantages which we explore in further 
studies below. 

In the analysis of pruning, we will explore the depen- 
dence of the pruned jets on the value of D from the jet 
algorithm. When reconstructing a boosted heavy parti- 
cle in a single jet, without pruning the reconstruction is 
optimized if the value of D is fit to the expected opening 
angle of the decay. However, this angle depends on the 
mass of the particle (which is not known in a search) and 
its pt- We will show that pruning reduces the sensitiv- 
ity to D and allows one to use large D jets over a broad 



range in px to search for heavy particles. This makes a 
search much more straightforward to carry out by using 
pruning. 

Values for the two parameters of the pruning proce- 
dure, Zcut and -Dcut, can be well motivated. In the fol- 
lowing studies, we will show that the results of pruning 
are rather insensitive to the parameters, and that the op- 
timal parameters are similar for different searches. That 
is, it is not necessary to tune the pruning procedure for 
individual searches. 

The parameter z^ut can be chosen based on the analy- 
sis of single-step and multi-step decays in Sec. |IV] Near 
the limit in boost where decays are reconstructed in a 
single jet, the value of z is typically large. It is only at 
large boosts, where the production rate of heavy particles 
is much smaller, that small values of z are allowed for re- 
constructed decays. Therefore, we can choose a value of 
Zcut that will keep all reconstructed parton-level decays 
at small boost, and only remove a small fraction of decays 
at larger boosts. For both the kx and CA algorithms, we 
set Zcut = 0.10 initially, and will study the performance 
of pruning as Zcut is varied for different searches. 

The parameter -Dcut can be determined on a jet-by-jet 
basis, allowing pruning to be more adaptive than a fixed 
parameter procedure. -Dcut essentially determines how 
much of the jet substructure can be pruned, with smaller 
values allowing for more pruning. -Dcut should be suf- 
ficiently small so that if a decay is "hidden" inside the 
jet substructure by late recombinations of, say, UE par- 
ticles, the substructure can be pruned and the decay can 
be found. A value that is too small, however, will result 
in over-pruning. A natural scale for -Dcut is the open- 
ing angle of the jet. However, this is an infrared unsafe 
quantity, as soft radiation can change the opening angle. 
Instead, the dimensionless ratio mj/pT, for the jet is re- 
lated to the opening angle: typically, A-R12 ~ 2mj/pTj- 
Therefore, we choose .Dcut to scale with 2mj/pTj, and a 
value -Dcut = ^j/pt., is a reasonable starting value. We 
will study the performance of pruning as a function of 
the scaling of -Dcut with 2m j /ptj ■ 



A. Effects of Pruning 

Having defined the pruning procedure, we can demon- 
strate how effective it is in reducing systematic effects and 
improving the mass resolution of jets. In this study, we 
use the parameters -Dcut = rn.j/PTj for both algorithms, 
and Zcut = 0.10 for the CA algorithm and 0.15 for the 
kx algorithm. We will motivate these parameters with 
the study in Sec. VIII A First, in Fig. |24lwe reproduce 
comparison in Fig. |23|from Sec. [V] 



ITAI First, in Fig. |24l 

the "hadron-parton" 
using pruning at both the hadron level and the parton 
level. The parton-level pruning is implemented in the 
same way as defined above, treating the three partons of 
the reconstructed top quark as the jet. 

It is clear by comparing Figs. [23] and [24| that pruning 
has removed much of the systematic effects in the CA 
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FIG. 24: Distributions in z and Ai?i2 comparing for 
top quark decays at the parton-level and from Monte 
Carlo events after implementing pruning. This figure 
uses the same samples and cuts as Fig. 23 
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FIG. 25: Distributions in mj with and without 
underlying event, for QCD and top jets, using the CA 
algorithm, with and without pruning. The jets have pt 
between 500 and 700 GeV, and D 1.0. 



algorithm; when only a jet mass cut is made, the distri- 
bution in z and Ai?i2 for pruned jets match the parton- 
level distribution much better than unpruned jets. When 
both mass and subjet mass cuts are made, pruning shows 
a slightly poorer agreement to the parton-level kinemat- 
ics than the unpruned case. This arises from the fact 
that the value of z^ut is fixed, while the distribution in z 
is dependent on the kinematics of the decay. 

In addition to improving the kinematics of the jet sub- 
structure, pruning reduces the contribution of the under- 
lying event and improves the mass resolution of recon- 
structed decays. In Figs. 25 and 26 we give the mass 



distribution of jets with and without the UE in both the 
QCD and tt samples for the CA and kx algorithms, but 
now with and without pruning. In Figs. 27 and 28 we 



show how the effect of UE on distributions in z and AR12, 
also with and without pruning. 

Three distinctions between pruned and unpruned jets 
are clear. First, the distributions with and without the 
UE are very similar for pruned jets, while they notice- 
ably differ for unpruned jets. This shows that pruning 
has drastically reduced the contribution of the underly- 
ing event. Second, the mass peak of jets near the top 
quark mass in the tt sample is significantly narrowed by 
the introduction of pruning (especially when the UE is 
included). This is evidence of the improved mass resolu- 
tion of pruning, and will contribute to the improvement 
in heavy particle identification with pruning. And finally, 
the mass distribution of QCD jets is pushed significantly 
downward by pruning. The QCD jet mass is dominantly 
built from the soft, large-angle recombinations — most 
recombinations are soft, and for fixed pt, larger-angle re- 
combinations contribute more to the jet mass. Removing 
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FIG. 26: Distributions in mj with and without 
underlying event, for QCD and top jets, using the kx 
algorithm, with and without pruning. The jets have px 
between 500 and 700 GeV, and D = 1.0. 



these by pruning the jets reduces the QCD mass distri- 
bution in the large mass range and will contribute to the 
reduction of the QCD background. 

We move on to examine pruning through a set of stud- 
ies using Monte Carlo simulated events. We will inves- 
tigate the parameter dependence of pruning, motivating 
the parameters used above. We will extensively study 
both top and W reconstruction with pruning, and quan- 
tify the improvements of pruning in terms of basic sta- 
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FIG. 27: Distribution in z with and without underlying 

event, for QCD and top jets, using the CA and kx 
algorithms, with and without pruning. The legends for 
plots (c) and (d) correspond to (a) and (b), respectively. 
The jets have px between 500 and 700 GeV, and D = 

1.0. 
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VII. MONTE CARLO STUDIES 
A. Study Layout 

The parameter space for questions about pruning pro- 
cedures is very large. In this work, we want to ask 
whether pruning is a viable data analysis tool, and how 
effective it can be. We use Monte Carlo samples to study 
W reconstruction and the rejection of + jets back- 
grounds, as well as top quark reconstruction and the re- 
jection of QCD multijet backgrounds. To test the use- 
fulness of pruning across a range of jet m/pT, and hence 
the heavy particle boost, we study both signals in four 
Pt bins. We will also be able to compare a signal with a 
single mass scale (the W) to one with two (the top). The 
details of the Monte Carlo samples and their generation 
are described in Appendix [A] 

In the following sections, we define a particular method 
to identify the heavy particles using jet substructure, 
and examine pruning in this context. In this work, we 
are more concerned with the improvements provided by 
pruning than its absolute performance. Therefore, we 
compare pruning to an analysis procedure where the jets 
are left unpruned. This comparison removes dependence 
on quantities that have large uncertainties, such as sig- 
nal and background cross sections, or are not specified, 
such as the integrated luminosity. Instead, the perfor- 
mance of pruning is quantified in terms of key measures 
— how much better pruning resolves the physically rel- 
evant substructure of the jet and separates signal and 
background processes than using the substructure from 
unpruned jets. 

Additionally, we test the performance of pruning as pa- 
rameters of the jet algorithm and the pruning procedure 
are varied. The performance will change with the pa- 
rameter D, since it controls how boosted the decay must 
be to be reconstructed in a single jet. We expect the D 
dependence to be closely correlated with the jet px, as 
it is a direct measure of the boost of the heavy particle. 
We also test the sensitivity of the pruning procedure to 
the parameters Zcut and Dcut- We aim to draw some ba- 
sic conclusions about how pruning should be applied in 
a search. 



B. Measures used to quantify pruning 



FIG. 28: Distribution in Ai?i2 with and without 
underlying event, for QCD and top jets, using the CA 
and kx algorithms, with and without pruning. The jets 

have Pt between 500 and 700 GeV, and D = 1.0. 



tistical measures. These studies will provide evidence of 
the insensitivity of pruning to the value of D in the jet 
algorithm. 



Mass variables are by far the strongest discriminator 
between QCD jets and jets reconstructing heavy particle 
decays. QCD jets have a smooth mass distribution set 
by the jet pr (see Sec. Ill I, while a decaying particle can 



have multiple intrinsic mass scales. This allows us to de- 
fine simple criteria to identify a jet as coming from a top 
quark: if the jet mass is in the top mass window and one 
of the two subjets has a mass in the W mass window, 
then we tag the jet as a top jet. The top and W mass 
windows are defined by fitting the relevant mass peaks 
of the signal sample, which we describe in detail below. 
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The W study proceeds analogously with only a jet mass 
cut. In a real search for a particle of unknown mass, 
one obviously cannot fit a "signal sample" . However, we 
employ this method to demonstrate two eff'ects of prun- 
ing: sharpening the signal mass peak and reducing the 
QCD background in this region. These two effects will 
determine how well pruning improves our ability to find 
bumps in jet mass distributions. 

Jet algorithms can be compared using a variety of mea- 
sures depending on how the algorithm is used. Our focus 
is on heavy particle identification and separation from 
background. In particular, we compare analyses per- 
formed with and without pruning to quantify the im- 
provement that pruning provides. We use a common set 
of variables to measure the relative difference between a 
jet algorithm and its pruned version. Let Ns{A) be the 
number of jets in the signal sample identified as a recon- 
structed heavy particle for algorithm A, and A'b(^) the 
analogous number of jets in the background sample. Use 
pA to denote the pruning procedure run on jets found 
with algorithm A. Then the variables we use are: 

NsipA) 



Ns{A) ' 



s 



NsiA)/N^iA) 
N,{pA)/./lUpA) 
N,{A)/^IU^ ■ 



(31) 



e is the relative efficiency of pruning in identifying heavy 
particles in the signal sample, while R and S are the rel- 
ative signal-to-background and signal-to-noise ratios for 
the pruned and unpruned algorithms. We also evaluate 
the relative mass window widths, which we label Wici- 
For the W study, this is the ratio of the W mass window 
width for pruning relative to not pruning; for the top 
study it is the ratio in the top mass window width. Note 
that in the top study, a W subjet mass cut is also used. 
A value of w^d < 1 means pruning has improved the 
mass resolution of the jets. These ratios are independent 
of the integrated luminosity and the total cross sections, 
and are representative of the improvements that pruning 
would provide in an analysis. 

To determine the mass window for a particular signal 
sample, we fit the mass peak to determine the window 
width. In these studies, a skewed Breit-Wigner is suffi- 
cient to fit the peak, with a power law continuum back- 
ground. These functions used to fit mass peaks are: 



peak: /(m) = 
continuum: g{m) ~ 



7 ^ , ^„,„ T7T^ (a + ^ M)) 

{■m? - AP)2 + APT^ ^ ^ " 

c d 

- + —■ 
m TO^ 

(32) 



M is the location of the mass peak; F is the width of the 
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FIG. 29: A sample fit showing the jet mass distribution 
(black histogram) and sample fit (blue curve) for CA 
jets from ti events. 



The mass window [A/ — F, A/ + F] is found to be nearly 
optimal, given this functional form, in measures similar 
to e, R, and S: the area in the window (~ e), the ratio 
of area to the window width (^ R), and the ratio of area 
to the square root of the width (~ S). In the next sec- 
tion, we will use these statistics to quantify the improve- 
ments gained by pruning in identifying heavy particles 
and separating them from backgrounds, and explore the 
advantages that can be achieved by pruning. 



VIII. STUDY RESULTS 

In this section we present results comparing analyses 
with pruned jets to unpruned jets. We demonstrate two 
main points: first, pruning is useful and broadly applica- 
ble, and second, its parameters do not need fine tuning 
for it to provide significant improvement. 

The natural starting point is to investigate the param- 
eters particular to the pruning procedure, D^ut and Zcut- 
The most important question is whether these need to 

IVIIIA 



be tuned to the signal. To answer this, in Sec 
we study the performance of pruning as we vary its pa- 
rameters for two different signals across the full pt range 
for the samples. We find that optimal choices of Zcut 
and Dcut vary slowly with m/pT, but that our choice of 
parameters is not far from optimal in all cases. 

After fixing Zcut and Dcut, we consider the effect of 
varying D in the jet algorithm. In Sec. |VIIIB| we study 
pruning with D fixed at 1.0 over all px bins. This type of 
analysis is like a search where the mass (and hence tti/pt) 
of the new heavy particle is not known. For comparison, 
in Sec. |VIIIC] we redo the analysis, but with D adjusted 
for each bin to fit the expected angular size of the de- 
cay in that bin. In this case, the unpruned jet algorithm 
performs better than with a constant D, as expected, 
but pruning still shows improvements in finding W^'s and 
tops. In all cases, pruned jets are a better way to identify 



peak. A sample fit it shown in Fig. 29 



heavy particles than unpruned. In Sec. VIII D| we com 
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pare the results of Sees. VIII B and VIII C Significantly, 



if jets are pruned, we find that it does not make much 
difference what the initial D value was, indicating that 
searches with large fixed D do not suffer in power com- 
pared to searches with D tuned to known or suspected 
m/pT- 

In Sec. |VIII E| we give some absolute measures of top- 
finding with pruned jets for comparison to other meth- 
ods. In Sec. |VIIIF| we directly compare the CA and 
kx algorithms, before and after pruning. Finally, in 
Sec. IVIII Gl we consider the effect of a crude detector 
model where we smear the energies of all particles in the 
calorimeter. We find that the performance of the pruned 
and unpruned algorithms are degraded, but that pruning 
still provides significant improvement. 



A. Dependence on Pruning Parameters 

The pruning procedure we have defined has two free 
parameters (in addition to those of the jet algorithms 
themselves). In introducing the procedure, we argued 
that Zcut — 0.10 and -Dcut = 'm-j/PTj were sensible 
choices. In this subsection we will investigate how prun- 
ing performs when each of these parameters is varied 
while the other is held fixed, for both {W and top) signals 
and across the four px bins for each signal. 

We will look at the values of the metrics Wroi, R, and 
S defined in Sec. ??. The priority in choosing particular 
values for Zcut and -Dcut should be in optimizing S, as it 
is the criterion for discovery. That being said, e and R 
are still important measures as they determine the total 
size of the signal and remaining fraction relative to the 
background. As we will see, the dependence of e and R 
on the parameters is not strong. We also evaluate Wrei 
because the mass window width drives the other three 
metrics. As the relative width decreases, in general the 
measures R and S will increase because the heavy particle 
is better resolved and more of the background is rejected, 
but e will decrease simply because the narrower width 
selects fewer signal jets. 

In Fig. |30] we show all four metrics for top and W 
jets, for both CA and kx jets. -Dcut is set to mj/pTj 
throughout, and Zcut is varied in [0, 0.25]. Zcut = rep- 
resents no pruning and we can see that all metrics are 1 
here. With increasing pruning, the mass window width 
initially decreases rapidly, then levels out. In all but the 
smallest pT bin, the relative signal efficiency e increases 
as the width narrows, suggesting that signal jets that had 
"vacuumed up" too much UE or soft radiation are being 
pruned back into the mass window. Note that for the top 
quark sample with the kx algorithm, e merely ffattens 
out for a range in Zcut; and does not increase as it does 
for the other samples. Once the window stops shrink- 
ing significantly (around Zcut = 0.05), the relative signal 
efficiency starts decreasing; now the dominant effect is 
over-pruning signal jets out of the mass window. Note, 
however, that even though the relative signal efficiency 



is decreasing, the relative signal-to-background ratio R is 
increasing over the full range. So even as signal jets are 
being removed from the mass window, background jets 
are being removed even faster. If we look at signal-to- 
noise, S, there appears to be a broad optimal range in 
Zcut that depends somewhat on the signal, on the px bin 
and on the jet algorithm. 

There are two important lessons to be learned from 
these plots. First, more pruning is required for kx jets 
than for CA to achieve similar results. The right two 
columns (kx) are similar to the left two (CA) except 
that features are shifted out in Zcut- Second, the peak 
in S does not depend strongly on the signal or the pt, 
in the three largest px bins. The dependence on S in 
the smallest px bin, however, is different from the others 
due to threshold effects of the heavy particle being recon- 
structed in a single jet. In the smallest p^ bin, the boosts 
of the W^'s or tops are small enough that many decays 
are just at the threshold for being reconstructed. De- 
cays at the reconstruction threshold typically have poor 
mass resolution, and cutting more aggressively on z re- 
duces these threshold effects and significantly decreases 
the background, leading to an increase in S over the 
whole range in Zcut- For CA, our "reasonable choice" 
of Zcut of 0.1 looks close to optimal for the upper three 
bins, and not far off for the smallest. For kx, a larger 
Zcut is needed; 0.15 is close to optimal. 

Additionally, these plots offer an interesting perspec- 
tive on the substructure dependence on z for both algo- 
rithms. The tt sample for the CA algorithm is the most 
instructive. In this case, small values of Zcut lead to dra- 
matically increased efficiency for finding top jets in the 
larger px bins. This is due to the improved ability af- 
ter pruning to find the W as a, subjet of the top. At 
large px with a fixed D = l.O, the opening angle of the 
top quark decay is much smaller than D. This means 
that the top quark decay is very localized in the jet, and 
much of the jet area includes soft radiation. For the CA 
algorithm, which recombines solely by the angle between 
protojets, this tends to delay recombining the soft pe- 
ripheral radiation until the end of the algorithm. The 
result is substructure with small z at the last recombina- 
tion that is not representative of the top quark decay — 
neither daughter protojet of the top has the W mass. As 
an illustration of this point, in Fig.|3T]we plot the distri- 
bution of z for unpruned jets in the top mass range for the 
CA algorithm in the largest and smallest px bins. Note 
that in the largest px bin, where the top quark decay is 
highly localized in the jet and the decay angle is much 
less than D, there is a substantially increased fraction of 
jets with a small value of z. This does not occur in the 
smallest px bin, where most of the reconstructed tops are 
at threshold for being just inside the jet. When pruning 
is implemented, however, much of this soft radiation is 
removed. In Fig. 32 we plot the same distributions as 



in Fig. |3T] but for pruned jets. In this case, no jets with 
the top mass have small z, since pruning has removed 
those recombinations. This leads to a highly enhanced 
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FIG. 30: Relative statistical measures w^-d, e, R, and S vs. Zcut for W's and tops, using CA and kx jets. Four px 
bins are shown for each sample. Statistical errors (not shown) are for w^d and e, and 0(10%) for R and S. 




(a) Pt bin 1, 200-500 GeV (b) pr bin 4, 900-1100 GeV 



FIG. 31: Distribution in z for unpruned CA jets in the 
top mass window for two px bins. The small pt bin 
distribution (left) has only a small enhancement of 
entries at small z, while the large px bin distribution 
(right) is dominated by small z. 



efficiency to resolve the W subjet and identify the jet and 
a top jet. In Sec. |VIIIB[ we will study pruning when the 
value of D is matched to the average angle of the heavy 
particle decay, and we will see that the performance of 
the unpruned CA algorithm improves. 

By contrast, this situation does not occur for the Ict 
algorithm. Even when the value of D is mismatched with 
the top quark decay angle, the soft radiation on the pe- 
riphery of the jet is recombined early in the kx algorithm 
because of the px weighting in the recombination metric. 
Therefore, there is no increase in efficiency with increas- 
ing Zcut for large px, and the decrease in e comes from 
the narrower width of the top and W mass distributions. 
The small variation in the measures R and S for the kx 
algorithm at small Zcut is evidence of the fact that k^ 
tends to have many fewer small-z recombinations at the 




z 



FIG. 32: Distribution in z for pruned CA jets in the top 
mass window for two px bins, using Zcut — 0.10. 



end of the algorithm, and supports the larger value of 
•Zcut — 0.15 for the kx algorithm that we will use in the 
remainder of the study. 

We now fix Zcut to study the dependence on Z^cut • For 
the CA algorithm we choose Zcut = 0.1, and for kx we 
choose 0.15. In Fig. |33] we plot w^ci, e, R, and S as i^cut 
is varied in [0, 5mj/pxj]. While Zcut sets the minimum 
px asymmetry that recombinations can have, D^ut sets 
the minimum opening angle for recombinations that can 
be pruned. We can think of Dcut as determining which 
recombinations can be pruned, and Zcut as determining 
whether or not that pruning takes place. This difference 
is clearer when we consider two limiting values of -Dcut 
and their impact on the pruned jet substructure. 
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FIG. 33: Relative statistical measures Wic\, e, R, and S vs. D, 
Pt bins are shown for each sample. Statistical errors (not shown) are 0(1%) for w^ci and e, and 0(10%) for R and S 



cut/^r^ for W's and tops, using CA and kx jets. Four 



As i^cut grows past 2mj/pT,, any recombination must 
have a large opening angle between the daughters to be 
pruned. Note that the limit Dcut — *■ oo is the limit of 
no pruning. For both the CA and kx algorithms, in this 
limit only very late recombinations in the algorithm can 
be pruned (if the jet can be pruned at all). In this limit, 
we expect the statistical measures to tend to one as the 
amount of pruning decreases. 

The second limit is D^ut — > 0. In this limit any re- 
combination can be pruned, since the minimum opening 
angle needed is very small. As Dcut decreases towards 
zero, more of the jet substructure can be pruned. In 
particular, earlier recombinations — those with smaller 
opening angle on average — can be pruned as Dcut de- 
creases. In general, these early recombinations are as- 
sociated with the QCD shower, and pruning them can 
degrade the mass resolution of the jet because too much 
radiation is being removed. Therefore, we expect the 
performance of pruning to be poor in this region. 

Both of these limits are present in Fig. |33| and our 
expectations about these limits are correct. It is in the 
intermediate region, where Ucut ~ ™j/ptj, that the per- 
formance of pruning is optimal, with a maximum in S 
that is not very sensitive to the px bin, sample, or al- 
gorithm. This value of I?cut = f^j/PT,, is sensible when 
we recognize that the average opening angle of the jet 
is approximately 2m j /ptj , and half this value allows for 
pruning of late recombinations but not the soft, small- 
angle recombinations associated with the QCD shower. 

For the remainder of the study, we fix the pruning pa- 
rameters Zcut = 0.1 for the CA algorithm and Zcut = 0.15 
for the kx algorithm, as well as L'cut = '^j Iptj for both 



algorithms. With these parameters fixed, we move on to 
discuss more interesting tests of the pruning procedure, 
namely the improvements conferred by pruning over a 
range in heavy particle boost and the D dependence of 
the pruning procedure. 

B. Top and W Identification with Constant D 

In a search for heavy particles decaying into jets, it 
may be unfeasible to divide a sample into Pt bins and 
use a tailored jet algorithm to look for local excesses in 
the jet mass distribution in each pT bin. (A "variable- 
i?" method for avoiding p^-binning, which we do not 
consider here, has recently been suggested [IB]. This still 
requires knowing or guessing the mass of the new parti- 
cle, since it is m/pT that determines the relevant angular 
size.) For instance, the appropriate angular scale may be 
unknown because the mass of the heavy particle is not 
known or the production mechanism is not well under- 
stood (so that the spectrum of heavy particle boosts is 
not known) . In this large-D jet algorithm may be 

used to search for heavy particles reconstructed in single 
jets. To mimic such an analysis, and provide a reference 
point for further tests of pruning, we find our statistical 
measures for W and top quark jets, over a range of jet 
Pt bins and with a fixed D of 1.0. 

In Fig.|34]we plot the values for Wrd, e, R, and S versus 
Pt bin for Ws and tops, using the CA and kx algorithms. 
For both algorithms, pruning improves W and top find- 
ing, with substantial improvements for large pt- The 
measure S in the smallest pt bins ranges from 30-40%, 
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FIG. 34: Relative statistical measures 



Wrci, e, R, and S vs. pt for W's and tops, using CA and kx jets with D = 1.0. 
Statistical errors are shown. 



growing to values between 100-600% in the largest px 
bins. At large pt in the top quark study, the improve- 
ment in signal-to-noise for the CA algorithm is larger 
than for the kx algorithm, as is the relative efficiency to 
identify tops. This arises because the CA algorithm is 
poor at reconstructing the as a subjet of the top jet 
at large px when the value of D is not matched to the 
opening angle of the decay. We will investigate this case 
further in the rest of the analysis. 



C. Top Identification with Variable D 





W 


PT (GeV) 


125-200 


200-275 


275-350 


350-425 


"tuned" D 


1.0 


0.8 


0.6 


0.4 




top 


PT (GeV) 


200-500 


500-700 


700-900 


900-1100 


"tuned" D 


1.0 


0.7 


0.5 


0.4 



TABLE I: "Tuned" D values for W and top pt bins. 
The fixed-_D analysis used D = 1.0, so the smallest bin 
does not change. 

For an analysis where the heavy particle mass is 
known, the jet algorithm can be tailored to the jet pt 
when searching for the heavy particle reconstructed in a 



single jet. In this case, the D value can be chosen using 
the relation 



D = min 1.0,2 



m 

Pt 



(33) 



where m is the heavy particle mass and pt is the trans- 
verse momentum of the jet. We take 1.0 to be the max- 
imum allowed value of D. The D values we use are 
given in Table [ij In Fig. 35 we plot w^c\, e, i?, and S 
for jets with these D values used for each pT bin. Note 
that Eq. (33 1 neglects the differences between algorithms. 



which depend on the particular decay. As an example of 
the fidelity of this relation for D, recall Fig. 18 which 



plotted the distribution in Ai? for reconstructed parton- 
level top quark decays with a top boost of 7 = 3. Eq. ( 33 1 



suggests the value D = 0.7, while the means of the CA 
and kx distributions for the reconstructed parton-level 
decay are 0.75 and 0.65 respectively. Because the dis- 
tribution in opening angles of the reconstructed decay is 
broad, by using a smaller, fixed D some decays will not 
be reconstructed by the jet algorithm. 

The difference between the case of constant D = 1.0 
and variable D is readily apparent. When the D value 
is matched to the expected opening angle of the decay, 
the improvements in pruning are flatter over the whole 
range in pT, and generally decreasing towards high pT- 
The decreased efficiency for pruning, especially for the 
kx algorithm, is outweighed by the increases in R and S 
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FIG. 35: Relative statistical measures Wroi, e, R, and S vs. pr for W's and tops, using CA and kx jets. Instead of a 
fixed I? = 1.0, a tuned D is used for each pr bin (see Table |l]). Statistical errors are shown. 



over the whole range in px- 

Although pruning shows improvements over a broad 
range in px for both constant and variable D, we want 
to compare results for both approaches. This serves as 
an indicator of how sensitive the final pruned jet is to the 
value of D from the jet algorithm. 



D. Comparing Pruning with Different D Values 

In the previous two subsections we have seen that an 
unpruned analysis performs much better when D is tuned 
to the m/pT of the signal. We now consider whether this 
is true of a pruned analysis. 

In each px bin, we can compare the results of pruned 
jets with D = 1.0 with pruned jets using value of D 
fit to the expected size of the decay. Because the naive 
expectation is that the tuned value of D will yield better 
separation from background, we find the improvements 
in pruning when D is tuned, relative to pruning with a 
fixed D of 1.0. Analogous metrics, wd, £d, Rd, and Sb, 
are used, but now they compare the results from pruning 
with the tuned D value to the results from pruning with 
Z? = 1.0. For instance, 

^ _ S/ B from pruning with tuned D 
S/B from pruning with D = 1.0 

Note that xd > 1 indicates that tuning D yields an im- 



provement. The values of these four measures are shown 
in Fig.[36|over the range ofpr- Note that since the tuned 
value of D in the smallest pt bin is 1.0, the comparison 
there is trivial and so is not shown. 

These results show only small improvements in Sd, 
with the statistical error bars at most data points in- 
cluding the value Sd = 1. They indicate that the im- 
provements after pruning are roughly independent of the 
value of D used in the jet algorithm, as long as that D 
is large enough to fit the expected size of the decay in 
a single jet. From the point of view of heavy particle 
searches, we can conclude that pruning removes much of 
the D dependence of the jet algorithm in the search. 



E. Absolute IVIeasures of Pruning 

So far, we have only considered measures of pruning 
relative to a similar analysis without pruning, because 
this factors out much of the dependence on details of the 
samples. However, several recent studies report absolute 
performance metrics for heavy particle identification, so 
we examine similar measures here for completeness. In 
addition, we directly compare the CA and kx algorithms, 
with and without pruning. 

As can be seen from the plots of Wrc\ in previous sec- 
tions, pruning reduces the width of the mass distribution 



for heavy particles. In Figs. 37a 37b and |37c| we plot 
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FIG. 36: Relative statistical measures wd, cd, Rd: and So vs. px for Ws and tops, using CA and kx jets. The 
measures now compare pruning with a tuned D value in each pT bin to pruning with a fixed D. Statistical errors 

are shown. 



the absolute widths of the fitted mass distributions for 
both the top and W in the ti sample and the W in the 
WW sample, over all pT bins. We plot this width for 
the pruned and unpruned version of the CA and kx al- 
gorithms. 

Note that the heavy particle identification method we 
use in this work selects jets within a range of width 2r, 
with r coming from a fit to the signal sample. This gives 
rise to a mass range cut that is typically much narrower 
than fixed width ranges used in other studies, and hence 
the absolute efficiency to identify heavy particles is lower. 



In Figs. |38a| and |38b| we plot the absolute efficiency 
to identify tops and Ws in the two signal samples for 
both algorithms, with and without pruning. For the top 
sample, this efficiency Cabs is the ratio 



_ # of top jets in the signal sample 
'^^^ # of parton-level tops in the pT range 



(35) 



for each pt bin, with Cabs defined analogously for the W 
sample. Because the substructure of the W decay is much 
simpler than the top decay, with no secondary mass cut, 
the absolute identification efficiencies are similar between 
all algorithms. 

The efficiency to find top quarks is only meaningful 
when compared to the fake rate for QCD jets to be 
misidentified as a top quark. We define this fake rate 
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as 



Cfake 



# of fake top jets in the background sample 
# of unpruned jets in the px range 



(36) 

for each pT bin, and analogously for the W sample. In 
Figs. |38c| and |38d| we plot efake for tops and Ws in the 
two background samples for both algorithms, with and 
without pruning. The fake rate is significantly reduced 
for pruned jets compared to unpruned jets, for both the 
top and W studies. The decrease in absolute efficiency 
arising from using a narrow mass window is compensated 
by a correspondingly small fake rate for QCD jets. 
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FIG. 38: Eabs and Cfakc vs. pr bin, for the CA and kx 
algorithms with and without pruning, using D = \.Q. A 

"p" before the algorithm name denotes the pruned 
version. The legend for figure (a) applies to figures (b) 
and (d). 



F. Algorithm Comparison 

Throughout this paper, we have studied how pruning 
compares to not pruning for the CA and kx algorithms. 
However, it is also of interest to study how the CA and 
kx algorithms compare, with and without pruning. To 
do this, we use statistical measures wa, £a, Ra, and Sa 
analogous to lUici, e, R, and S. For instance, 

_ S/B from the CA algorithm with D = 1.0 
^ ^ S/B from the kx algorithm with D = 1.0 ' ^ ' 

We will change the subscript to pA to compare the 
pruned versions of the algorithms, e.g., 

_ S/B from pruned CA with D = 1.0 

" S/B from pruned kx with D = 1.0 ' ^ ' 



In Fig. 39 we plot the measures comparing CA to kx 
and pruned CA to pruned kx for both the WW and tt 
samples. 

These comparisons illustrate many of the effects that 
we have observed throughout the studies in this paper. 
For the unpruned algorithm comparison, CA tends to 
have a much lower efficiency to identify tops than kx. As 
Pt increases, CA performs more poorly relative to kx, 
with the efficiency decreasing significantly. This arises 
because the CA has a decreasing efficiency to identify the 
W at high Pt, when the top quark becomes more local- 
ized in the fixed D jet. Pruning corrects for this, though 
the performance of CA relative to kx still decreases at 
high PT- 

The WW sample is instructive because it lets us com- 
pare the effectiveness of pruning between CA and kx 
across a wide range in pT- For the unpruned algorithms, 
the performance of CA relative to kx is fairly consistent 
over all pr, reflecting the fact that W identification is 
simpler than top identification, with accurate mass re- 
construction the only requirement. However, when the 
jets are pruned, the performance of pruned CA relative 
to pruned kx improves in the smallest px bin and worsens 
in the largest pt bin, as compared to the performance of 
CA versus kx for unpruned jets. This skewing of the sta- 
tistical measures indicates that pruning is more effective 
for CA than kx at small pT, where threshold effects are 
important, and more effective for kx than CA at large 
PT- 



G. Detector Effects 

So far, no detector simulation has been applied to 
the simulated events aside from clustering particles into 
massless calorimeter cells. We now consider a technique 
that approximates the impact that detector resolution 
has on the effectiveness of pruning. We modify our top 
and W jet analyses by smearing the energy E of each 
calorimeter cell with a factor sampled from a Gaussian 
distribution with mean E and standard deviation a given 

by 



a{E) = Va^E + b^ + c'^E^. 



(39) 



We consider a parameter set motivated by the expected 
ATLAS hadronic calorimeter resolution [21] , {a, b, c} = 
{0.65,0.5,0.03}. One obvious effect of the detector 
smearing is degraded mass resolution. In Fig. |40] we 
show this effect by plotting the jet mass distribution for 
the tt sample in the first pT bin. Even after smearing, 
however, pruning improves the jet mass resolution. In 
Fig. |41| we plot the pruned and unpruned jet mass dis- 
tribution for the tt sample in the first pt bin. Note that 
because the QCD jet mass distribution is smooth, only 
the overall size of the sample in the mass window changes, 
so we do not plot these distributions. 

If Fig. |42] we repeat the basic analysis of Sec. |VIIIB| 
applying the detector smearing described above to events 
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FIG. 39: Relative statistical measures comparing CA to kx jets and pruned CA to pruned kx jets vs. pr for W's 

and tops, using D = 1.0. Statistical errors are shown. 
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FIG. 40: Distribution in jet mass for tt events, with 
(dashed) and without (solid) energy smearing. The jets 
have pt of 200-500 GeV and D — 1.0, and there is no 
pruning. 
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FIG. 41: Distribution in jet mass for pruned (dashed) 
and unpruned (solid) jets, for tt events with energy 
smearing. The jets have px of 200-500 GeV and 
D = 1.0. 



using D = 1.0 over all four px bins and plotting the 
measures Wrci, e, R, and S. This figure can be compared 
to Fig. [34] from the previous analysis, which plots the 
same measures when no energy smearing is used. The 
improvements are very similar to those for unsmeared 
jets, good evidence that pruning may retain its utility in 
a more realistic detector simulation or in real data. 



IX. CONCLUSIONS AND FUTURE 
PROSPECTS 

In this work, we have demonstrated that recombination 
jet algorithms shape the substructure of heavy particles 
reconstructed in single jets. We have identified regions 
in the variables z and A_R where individual recombina- 
tions are unlikely to represent the kinematics of a re- 
constructed heavy particle. Specifically, soft, large-angle 
recombinations are unlikely to arise from the accurate 
reconstruction of a heavy particle decay, and are likely 
to come from QCD jets, uncorrelated radiation, or sys- 
tematic effects of the jet algorithm. For the CA algo- 
rithm, we have demonstrated that these soft, large-angle 
recombinations are a key systematic effect that shapes 
the substructure of the jet, in particular the final recom- 
binations. 

We have presented a procedure, calling pruning, that 
eliminates soft, large-angle recombinations from the sub- 
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FIG. 42: Relative statistical measures Wie\, e, R, and S vs. px for Ws and tops, using CA and kx jets. Calorimeter 
cell energies are smeared as described in the text. Statistical errors are shown. 



structure of the jet. Using hadronically decaying top 
quarks and W bosons as test cases, we have demon- 
strated that the pruning procedure improves the separa- 
tion between heavy particles decays and a QCD multijet 
background. We have motivated the parameters of the 
pruning procedure and demonstrated that they roughly 
optimize the improvements from pruning in our study for 
both top quarks and W bosons. 

Our studies on pruning have demonstrated many pos- 
itive results of tlu^ procedure. In a heavy particle search, 
the jet is sensitive to the parameter D, and if the value 
of D is not well matched to the decay of a heavy par- 
ticle then the ability to identify that particle in single 
jets is greatly reduced. Our results indicate that pruning 
removes much of the jet algorithm's dependence on D. 
Pruning shows improvements even when D is adjusted 
to fit the expected decay of the heavy particle. We have 
demonstrated that pruning is insensitive to the effects 
of the underlying event, as the underlying event mainly 
contributes soft, uncorrelated radiation to a jet. Addi- 
tionally, we have shown that the results of pruning are ro- 
bust to a basic energy-smearing applied to the calorime- 
ter cells used to seed the jet algorithm. Finally, we have 
quantified absolute measures of the pruning procedure 
that can be used to compare to other jet substructure 
methods. 

It should be reiterated that pruning systematizes 
methods that have been proposed by other authors for 



specific searches. Pruning shoTild be applicable to a wide 
range of searches, and is intended to be a generic jet anal- 
ysis tool. We have detailed the ideas behind why pruning 
works and why it should be used, and presented an in- 
depth discussion of many of the physics issues arising 
when studying jet substructure. 



A. Future Prospects 

The conclusions in this paper, like those for any anal- 
ysis technique not demonstrated on real data, must be 
taken cautiously. This is especially true for studies like 
this one on jet substructure, where a majority of the work 
has been in exploring techniques that may — or may not 
— actually be useful in an experiment. However, new 
techniques like jet substructure offer great promise. All 
studies thus far indicate that jet substructure, and in 
general a more innovative approach to jets, will be a use- 
ful tool for understanding the physics in events with jets 
at collider experiments. 

The most obvious and immediate application of prun- 
ing, and jet substructure tools in general, is in redis- 
covery of the Standard Model at the LHC. As the LHC 
collects data from high-energy collisions, there will be an 
abundant sample of high-pr top quarks, and W and Z 
bosons with fully hadronic decays. As these channels are 
observed using standard analyses, jet substructure tech- 
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niques can be applied and tested. These channels can 
also serve as key calibration tools for jet substructure 
methods appUed in the search for new physics. 

From the theoretical side, improvements in jet-based 
analyses can come from a variety of sources. As calcu- 
lations in perturbative QCD progress, they can be used 
to improve predictions for jet-based observables in QCD. 
Improved Monte Carlo tools, such as the continued im- 
plementation of next-to-leading order matrix elements 
and better parton showers, will lead to more accurate 
studies and a better understanding of jet physics. Ad- 
ditionally, the framework of soft-coUinear effective the- 
ory (SCET) can improve the understanding of QCD jets 
[23 ESI EH EH Eg. As SCET is adapted to describe 
a wider variety of event topologies and realistic jet al- 
gorithms are implemented in the effective theory, it can 
be used to calculate resummed predictions for jet-based 
observables and accurately describe processes that are 
difficult to access with perturbative QCD [301 1311 132 • 
Jets will likely play a central role in new physics searches 
at the LHC, and a better understanding of jets and jet 
substructure can aid in the discovery process. 
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APPENDIX A: COMPUTATIONAL DETAILS 

We give a brief summary of the computational tools 
employed to do the studies in this paper. We gener- 
ate LHC (14 TeV) events using MadGraph/MadEvent 
V4.4.21 [33] interfaced with Pythia v6.4 [33]. We employ 
MLM-style matching, implemented in MadGraph (see, 
e.g., [3S]), on the backgrounds. We have checked that 
our matching parameters are reasonable using the tool 
MatchChecker [36j. We use the DWT tune [37, in Pythia 
to give a "noisy" underlying event (UE). For the hadron- 
level studies in Sees. |III| and |IV| we exclude the under- 



lying event by setting the Pythia parameter MSTP(81) 
to zero, turning off multiple interactions. The UE com- 
parisons in Sec. \V\ compare samples with this parameter 
set at or 1. No detector simulation is performed so 
we can isolate the "best case" effects of our method. In 
Sec. |VIII G| we examine the effects of Gaussian smear- 
ing on the energies of final state particles from Pythia to 
get a sense for how much the results may change with a 
detector. 

For the W study, the signal sample is W^W~ pair 
production, with exactly one W required to decay lep- 
tonically. The background is a matched sample oi d.W 
and one or two light partons (gluons and the four light- 
est quarks) before showering. These partons must be 
in the central region, |?7| < 2.5. t] is the pseudorapidity, 
r\ = ln(cot(0b/2)), with Q\, the polar angle with respect to 
the beam direction (j] ^ y for massless particles). Signal 
and background samples are divided into four bins: 
[125, 200], [200, 275], [275, 350], and [350, 425] (all in 
GeV). Each bin is defined by a -pT cut that is applied 
to single jets in the analysis. These bins confine the W 
boost to a narrow range and allow us to study the per- 
formance of pruning as the jet (or W boost) varies. 

For each pT bin [py'",p™'^'^], both samples are gen- 
erated with a pt cut on the leptonic W of p^'" — 25 
GeV. For the background, we set the matching scales 
(Q™f ,Qmatch) to be (10, 15) GeV in all four bins. 

For the top quark reconstruction study, the signal sam- 
ple is ti production with fully hadronic decays. The back- 
ground is a matched sample of QCD multijct production 
with two, three, or four light partons, with the same cut 
on parton centrality as in the W study. Samples are 
again divided into four bins: [200, 500], [500, 700], 
[700, 900], and [900, 1100] (all in GeV). 

We generate signal and background samples with a 
parton-level /it cut for generation efficiency, where /i-r 
is the scalar sum of all in the event. For each bin 
[p5;'",p5?^''], the parton-level hr cut is p??" - 25 GeV < 
/it/2 < p™^'' -I- 100 GeV. For the background, we use 
matching scales (20, 30) GeV for the smallest pT bin and 
(50, 70) GeV in the other three bins. 

From the hadron-level output of Pythia, we group 
final-state particles into "cells" based on the segmen- 
tation of the ATLAS hadronic calorimeter (Ar; — 0.1, 
— 0.1 in the central region). We sum the four- 
momenta of all particles in each cell and rescale the re- 
sulting three-momentum to make the cell massless. After 
a threshold cut on the cell energy of 1 GeV, cells become 
the inputs to the jet algorithm. Our implementation of 
recombination algorithms uses Fast Jet |38| , with a prun- 
ing plugin we have written 39j. 

Several of the plots in early sections involve mass 
cuts on jets. The details of these cuts are provided in 

sec. ivira] 
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